How Should I deal with my imbalanced binary target

Question

How Should I deal with my imbalanced binary target

Martin Xristev

2020年10月31日 02:52

I am trying to model my data with Python and i am having concerns about my binary target variable, because it has 90% cases falling in 0 and 10% of the cases falling in 1. I have tried upsampling my data and i got twice more observations than i had. I am not sure is it right to do it this way.

Topic target-encoding sampling class-imbalance algorithms machine-learning

Category Data Science

Oliver Foster · Accepted Answer · 2020年10月30日 23:03

There are a couple options:

Up-sample your data (as you described). Use SMOTE or a similar method to up-sample the lesser class to achieve closer to 50/50 split on positive/negative class respectively
If you have a lot of data - down-sample your more frequent class (thereby throwing away a lot of examples in the negative class)
Select performance metrics that do not get skewed due to class imbalance. F1 score is usually used but any metric that has some combination of precision and recall should do the trick. Avoid accuracy as a scoring metric in this case. Selecting the correct scoring metric also depends on the specifics of the business problem you are trying to address.

How Should I deal with my imbalanced binary target

About