Model Performance on external validation Set really low?

Question

Model Performance on external validation Set really low?

As13

2022年5月9日 17:09

I am using the LGBM model for binary classification. My train and test accuracies are 87% 82% respectively with cross-validation of 89%. ROC-AUC score of 81%. But when evaluating model performance on an external validation test that has not been seen before, the model gives a roc-auc of 41%. Can somebody suggest what should be done?

Topic lightgbm validation classification machine-learning

Category Data Science

Erwan · Accepted Answer · 2022年5月9日 17:09

First, an AUC less than 50% is terrible: it means that you get better performance by switching the positive and negative labels! So the model is doing worse than nothing on this data.

In general there seems to be some problem with the design of the task and/or its evaluation: either the original test set and/or the validation set is/are not a representative sample for the target problem.

In case the validation set is supposed to be representative, it means that the test set is not valid, possibly due to data leakage (it contains information from the training set). In this case the test set evaluation is meaningless, and the fact the validation performance is much lower means that there is overfitting: the model is capturing details which happen by chance in the training set, because it's too complex and/or the training set is not representative enough.

Model Performance on external validation Set really low?

About