Model recalibration on different dataset

Question

Model recalibration on different dataset

James Flash

2022年4月28日 11:20

I have a large dataset approximately 150k rows and 1500 of positive labels on which I can train my model for binary classification.

And also I have the other dataset which is smaller and is comprised from 80k rows and 100 positive labels.

The problem is that I can't train model on the small dataset because it results in bad quality. And the model trained on the large dataset can provide more stable outcomes for the second case due to the targets and domain similarity. Unfortunately, this model probability calibration is terrible for the small dataset.

So the question is, is it valid to apply the following pipeline: to train logistic regression on the large dataset - to recalibrate it on the small dataset with isotonic regression, for example - to score test data from the same source as the small dataset

I've implemented this pipeline and it looks good but I doubt whether it is correct. I've seen this post but I'm not sure it's about the same problem

Topic data-science-model probability-calibration logistic-regression classification

Category Data Science

Model recalibration on different dataset

About