How Should I deal with my imbalanced binary target

I am trying to model my data with Python and i am having concerns about my binary target variable, because it has 90% cases falling in 0 and 10% of the cases falling in 1. I have tried upsampling my data and i got twice more observations than i had. I am not sure is it right to do it this way.

Topic target-encoding sampling class-imbalance algorithms machine-learning

Category Data Science


There are a couple options:

  1. Up-sample your data (as you described). Use SMOTE or a similar method to up-sample the lesser class to achieve closer to 50/50 split on positive/negative class respectively

  2. If you have a lot of data - down-sample your more frequent class (thereby throwing away a lot of examples in the negative class)

  3. Select performance metrics that do not get skewed due to class imbalance. F1 score is usually used but any metric that has some combination of precision and recall should do the trick. Avoid accuracy as a scoring metric in this case. Selecting the correct scoring metric also depends on the specifics of the business problem you are trying to address.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.