How to improve model performace when model shows a systemic pattern in residues
I'm working on a regression model using Boosting algorithms (CatBoost, XGBoost, and LightGBM). All models give similar accuracy of 0.2 RMSE (Target varies from 0 to 1). I obtained the following plots when I plotted residues. My model is overpredicting for small target value (near zero) and underpredicting for large target value (near 1). How can I improve my model performance? The model is not overfitting and I'm doing an exhaustive hyperparameter search and basic feature engineering.
I'm trying to understand why I have a systemic pattern in residuals and how to address them. I went through some resources on residuals here with limited success.
I suspect the model is trying to do its best in the region with most samples (~0.2-.5) and neglecting other areas. Assigning higher weights to regions with low-performance doesn't seem to help much.
n_samples = ~10k
Target mean = 0.3457
Topic data-science-model boosting data ensemble-modeling
Category Data Science