SMOTE-NC does not help to oversample my mixed continuous/categorical dataset
When I use SMOTE-NC to oversample three classes of a 4-class classification problem, the Prec, Recall, and F1 metrics for minority classes are still VERY low (~3%). I have 32 categorical and 30 continuous variables in my dataset. All the categorical variables have been converted to binary columns using one-hot encoding. Also, before going for the over-sampling process, I am imputing all missing values using Iterativeimputer.
Regarding the classifiers, I am using logistic regression, random forest and XGboost. May I have your thoughts on this? Any suggestions to over-sample a multiclass and highly imbalanced dataset?
Topic smotenc class-imbalance categorical-data
Category Data Science