How to tackle imbalanced regression?
I've recently encountered a problem where I want to fit a regression model on data that's target variable is like 75% zeroes, and the rest is a continuous variable. This makes it a regression problem, however, the non-zero values also have a very high variance: they can take anywhere from between 1 to 105 million.
What would be an effective approach to such a problem? Due to the high variance, I keep getting regressors that fit too much to the zeroes and as a result I get very high MAE. I understand in classification you can use balanced weighting for example in RandomForests, but what's the equivalent to regression problems? Does SciKit-Learn have anything similar?
Topic imbalanced-data regression
Category Data Science