Give more weight to features based on distribution plot

I have a task to predict a binary variable purchase, their dataset is strongly imbalanced (10:100) and the models I have tried so far (mostly ensemble) fail. In addition, I have also tried to apply SMOTE to reduce imbalance, but the outcome is pretty much the same.

Analyzing each feature in the dataset, I have noticed that there are some clearly visible differences in the distribution of features between purchase: 1 and purchase: 0 (see images)

My question is: how can I pre-process my training set (as well as my dataset for future predictions) in order to make those differences more obvious for the model to be captured?

On the other hand, is it a good approach to deal with strong class imbalance?

Thanks a lot.

Topic binary-classification imbalanced-learn decision-trees class-imbalance classification

Category Data Science


Seems like an unbalanced SVM could work here? https://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html

Looking at the features, I'd guess an rbf kernel would be more useful than a linear one (two of the above features look more-or-less gaussian). A grid-search for the optimal parameters could also be useful here.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.