How to tackle imbalanced regression?

I've recently encountered a problem where I want to fit a regression model on data that's target variable is like 75% zeroes, and the rest is a continuous variable. This makes it a regression problem, however, the non-zero values also have a very high variance: they can take anywhere from between 1 to 105 million.

What would be an effective approach to such a problem? Due to the high variance, I keep getting regressors that fit too much to the zeroes and as a result I get very high MAE. I understand in classification you can use balanced weighting for example in RandomForests, but what's the equivalent to regression problems? Does SciKit-Learn have anything similar?

Topic imbalanced-data regression

Category Data Science


Zero-inflated models (https://en.wikipedia.org/wiki/Zero-inflated_model) first predict whether an individual's response will be zero, and then among the non-zero responses, predict categorical values.

If your non-zero values could be consider count or rate data, you might use:

statsmodels.discrete.count_model.ZeroInflatePoisson

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.