Improving roc auc score when accuracy is good

I have got a binary classification problem with large dataset of dimensions (1155918, 55)

Also dataset is fairly balanced of 67% Class 0 , 33% Class 1.

I am getting test accuracy of 73% in test set and auc score is 50 % Recall is 0.02 for Class 1 I am using a logistic regression and also tried pycaret's classification algorithm

Topic binary-classification roc scikit-learn machine-learning

Category Data Science


A majority baseline classifier would have 67% accuracy just by predicting every instance as class 0, so 73% accuracy is not especially good. The AUC is a more informative measure, but an AUC of 0.5 is actually the minimum a classifier can do.

And indeed, apparently this classifier is not doing much more than a baseline classifier:

  • Recall for class 1 is 0.02, so the number of True Positives (TP) is 67734*0.02=1355.
  • Precision for class 1 is 0.36 so the number of predicted positives (TP+FP) is 1355/0.36=3763.
  • This means that the classifier predicts only 3763/(184508+67734)=1.5% of the instances as class 1, even though the imbalance is not severe.

So what happens in this: most of the time the classifier doesn't succeed distinguishing the two classes, so it just predicts the majority class 0 (98.5% of the time).

Without any detail it's impossible to know why, maybe the features are not good enough indicators, maybe there is overfitting, maybe logistic regression is not the right approach for this dataset...

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.