How to tackle imbalanced regression?

Question

How to tackle imbalanced regression?

lte__

2022年4月22日 20:38

I've recently encountered a problem where I want to fit a regression model on data that's target variable is like 75% zeroes, and the rest is a continuous variable. This makes it a regression problem, however, the non-zero values also have a very high variance: they can take anywhere from between 1 to 105 million.

What would be an effective approach to such a problem? Due to the high variance, I keep getting regressors that fit too much to the zeroes and as a result I get very high MAE. I understand in classification you can use balanced weighting for example in RandomForests, but what's the equivalent to regression problems? Does SciKit-Learn have anything similar?

Topic imbalanced-data regression

Category Data Science

clementzach · Accepted Answer · 2022年4月22日 20:38

Zero-inflated models (https://en.wikipedia.org/wiki/Zero-inflated_model) first predict whether an individual's response will be zero, and then among the non-zero responses, predict categorical values.

If your non-zero values could be consider count or rate data, you might use:

statsmodels.discrete.count_model.ZeroInflatePoisson

How to tackle imbalanced regression?

About