How does SMOTE work for dataset with only categorical variables?
I have a small dataset of 977 rows with a class proportion of 77:23.
For the sake of metrics improvement, I have kept my minority class ('default') as class 1 (and 'not default' as class 0).
My input variables are categorical in nature. So, the below is what I tried. Let's assume we don't have age
and salary
info
a) Apply encoding like rare_encoding and ordinal_encoding to my dataset
b) Split into train and test split (with stratify = y
)
c) Apply SMOTE to resample the training data only.
However, my question is on how does SMOTE work/resample when there is only categorical variable like below
gender degree occupation Country status
MALE BE ENGGINER USA default
MALE ME RESEARCHER UK default
FEMALE BSc Admin staff NZ default
FEMALE MS Scientist sweden default
Now if my objective is to oversample minority sample using SMOTE, How will the above sample look like? Will they just randomly populate/shuffle gender, degree, occupation and country on different permutation and combinations?
Is there any simple explanation or tutorial that you can share for someone who likes to applying this technique?
My objective is to understand how does SMOTE work for categorical variables only dataset?
Topic smote deep-learning neural-network classification machine-learning
Category Data Science