Will oversampling help with generalization (small imbalanced dataset)?
I have an imbalanced dataset (2:1 ratio) with about 60 patients and 80 features.
I performed Recursive Feature Elimination (RFE) and stratified cross validation to reduce the features to 15 and I get an AUC of 0.9 with Logistic regression and/or SVM. I don't fully trust the AUC I got because I think it will not generalize correctly because of such a small positive class. So, I was thinking on oversampling (K-means + PCA) the minority class and re-run the RFE approach, would this help? Thanks.
My question is more or less the same as this one: Why will the accuracy of a highly unbalanced dataset reduce after oversampling? but I do use AUC.
Topic generalization auc overfitting
Category Data Science