How to improve model performace when model shows a systemic pattern in residues

I'm working on a regression model using Boosting algorithms (CatBoost, XGBoost, and LightGBM). All models give similar accuracy of 0.2 RMSE (Target varies from 0 to 1). I obtained the following plots when I plotted residues. My model is overpredicting for small target value (near zero) and underpredicting for large target value (near 1). How can I improve my model performance? The model is not overfitting and I'm doing an exhaustive hyperparameter search and basic feature engineering.

I'm trying to understand why I have a systemic pattern in residuals and how to address them. I went through some resources on residuals here with limited success.

I suspect the model is trying to do its best in the region with most samples (~0.2-.5) and neglecting other areas. Assigning higher weights to regions with low-performance doesn't seem to help much.

n_samples = ~10k

Target mean = 0.3457

Topic data-science-model boosting data ensemble-modeling

Category Data Science


Try using beta regression which is specifically designed to model target values that are continuous and restricted to the interval (0, 1).

Additionally, there is a disproportionate of target values that are zero values which could be modeled in a hierarchical way. A first model that predicts zero / not zero target value and then a second model that is beta regression model for values above zero and up to 1.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.