How to weight loss in regression
I've got a regression problem where a model is required to predict a value in the range [0, 1].
I've tried to look at the distribution of the data and and it seems that there are more examples with a low value label ([0, 0.2]) than higher value labels ([0.2, 1]).
When I try to train the model using the MAE metric, the model converges to a state where it has a very low loss, but it seems that the model has converged to a state in which it predicts a low value on many of the high value label examples.
So my assumption was that the data is imbalanced and I should try to weight the loss of the examples depending on their label.
Question: what is the best way to weight the loss in this configuration?
Should I weight each example by the value of its label using some function f(x) , where f(x) is low when x is low and high when x is high?
Or should I split the label values into bins ([0, 0.1), [0.1, 0.2) ... [0.9, 1]) and weight each bin (similarly to categorical loss weight)?
Topic weighted-data regression class-imbalance machine-learning
Category Data Science