Give more weight to features based on distribution plot

Question

Give more weight to features based on distribution plot

roberta

2022年2月10日 16:58

I have a task to predict a binary variable purchase, their dataset is strongly imbalanced (10:100) and the models I have tried so far (mostly ensemble) fail. In addition, I have also tried to apply SMOTE to reduce imbalance, but the outcome is pretty much the same.

Analyzing each feature in the dataset, I have noticed that there are some clearly visible differences in the distribution of features between purchase: 1 and purchase: 0 (see images)

My question is: how can I pre-process my training set (as well as my dataset for future predictions) in order to make those differences more obvious for the model to be captured?

On the other hand, is it a good approach to deal with strong class imbalance?

Thanks a lot.

Topic binary-classification imbalanced-learn decision-trees class-imbalance classification

Category Data Science

Joe Richardson · Accepted Answer · 2022年2月10日 16:58

Seems like an unbalanced SVM could work here? https://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html

Looking at the features, I'd guess an rbf kernel would be more useful than a linear one (two of the above features look more-or-less gaussian). A grid-search for the optimal parameters could also be useful here.

Give more weight to features based on distribution plot

About