Resampling : My dataset is categorical or numerical?
I have a dataset with 203 variables. Like age40 (0 -yes, 1-no), gender(0 or 1), used or not 200 types of drugs (one hot encoded into 200 variables), and one target variable (0 or 1). This is an imbalanced dataset where Counter({0: 5607, 1: 1717}).
May I know what kind of resampling strategy I should adopt for this kind of dataset?
Is this dataset considered as numerical or categorical datset?
I tried random under sampling and over sampling, but not satisfied with the ROC curve obtained after modeling.
Can I apply SMOTE considering this as numerical dataset?
I read in this , that In case the dataset only contains categorical variables, the Hamming distance is applied for resampling purpose and If the dataset only contains numerical variables, it is possible to apply traditional distances such as Euclidean, Manhattan or Minkowski.
In case of my dataset, is it okay to apply Euclidean distance for resampling? Could you please direct me to some sources showing how this is done for a datset with only binary values?
Topic features self-study class-imbalance binary dataset
Category Data Science