Model Performance on external validation Set really low?

I am using the LGBM model for binary classification. My train and test accuracies are 87% 82% respectively with cross-validation of 89%. ROC-AUC score of 81%. But when evaluating model performance on an external validation test that has not been seen before, the model gives a roc-auc of 41%. Can somebody suggest what should be done?

Topic lightgbm validation classification machine-learning

Category Data Science


First, an AUC less than 50% is terrible: it means that you get better performance by switching the positive and negative labels! So the model is doing worse than nothing on this data.

In general there seems to be some problem with the design of the task and/or its evaluation: either the original test set and/or the validation set is/are not a representative sample for the target problem.

In case the validation set is supposed to be representative, it means that the test set is not valid, possibly due to data leakage (it contains information from the training set). In this case the test set evaluation is meaningless, and the fact the validation performance is much lower means that there is overfitting: the model is capturing details which happen by chance in the training set, because it's too complex and/or the training set is not representative enough.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.