Positively skewed target label in regression
I have a dataset where the target label is positively skewed and produces a long tail, and currently I have a high residual on these values when experimenting with some linear, tree-based and neural-network regression models.
I see the same problem with the Boston Housing prediction dataset, and recommendations to apply a log transformation to the target label. This has given some small improvement but not enough. Additionally I've tried to randomly duplicate values within the tail to shift the mean, although I'm not overly comfortable with method.
Are there any alternative transformations to apply, or any models that can put a higher cost weighting on labels with high residuals?
Topic imbalanced-learn preprocessing regression machine-learning
Category Data Science