ROC-AUC Imbalanced Data Score Interpretation
I have a binary response variable (label) in a dataset with around 50,000 observations.
The training set is somewhat imbalanced with, =1 making up about 33% of the observation's and =0 making up about 67% of the observations. Right now with XGBoost I'm getting a ROC-AUC score of around 0.67.
The response variable is binary so the baseline is 50% in term of chance, but at the same time the data is imbalanced, so if the model just guessed =0 it would also achieve a ROC-AUC score of 0.67. So does this indicate the model isn't doing better than chance at 0.67?
Topic binary-classification xgboost roc class-imbalance
Category Data Science