Individual models gives quite same distribution on Test set, whereas Ensembling gives better result but very different distribution
I am working on a binary classification problem with unbalanced data (17% for positive class).
The problem is as following: My three individual models when predicting on the test set (for which I don't have the labels) gives quite similar distribution as for Train set.
But ensemling these models, while giving slighltly better result (F1-score), it drastically changes the distribution on Test set going from ~20% to 5%.
My question is :
I am confused between choosing the best individual model which maintains almost same distribution but lose some efficiency Or The ensembled one who gives really different distribution
And I have no Idea about the Test set distribution.
Thanks for any help
Topic ensemble distribution learning
Category Data Science