How to weight loss in regression

I've got a regression problem where a model is required to predict a value in the range [0, 1].

I've tried to look at the distribution of the data and and it seems that there are more examples with a low value label ([0, 0.2]) than higher value labels ([0.2, 1]).

When I try to train the model using the MAE metric, the model converges to a state where it has a very low loss, but it seems that the model has converged to a state in which it predicts a low value on many of the high value label examples.

So my assumption was that the data is imbalanced and I should try to weight the loss of the examples depending on their label.

Question: what is the best way to weight the loss in this configuration?

Should I weight each example by the value of its label using some function f(x) , where f(x) is low when x is low and high when x is high?

Or should I split the label values into bins ([0, 0.1), [0.1, 0.2) ... [0.9, 1]) and weight each bin (similarly to categorical loss weight)?

Topic weighted-data regression class-imbalance machine-learning

Category Data Science


If you are predicting values between 0 and 1, you should use beta regression.

Beta regression will handle the heteroskedasticity or skewness which are commonly observed in rates or proportions.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.