Lasso regression not getting better without random features
First of all, I'm new to lasso regression, so sorry if this feels stupid.
I'm trying to build a regression model and wanted to use lasso regression for feature selection as I have quite a few features to start with.
I started by standardizing all features and plotting the weights of each feature as I changed my regularisation strength to see which ones are most important. I also plotted the RMSE on the holdout set to find a U-shaped plot, i.e. as I increased the regularisation strength, my RMSE would decrease, then after a certain point, it start to increase. I couldn't. My RMSE plot was non-decreasing.
Then, I decided to throw in a random feature to see how the model would perform and compare my features vs. the random one. My expectation was again to see a U shape. However, that wasn't the case as I got another non-decreasing RMSE plot again which I pasted below. It also looked like two of my features, feat1 and feat3 were just as good as my random feature.
So here are my questions.
- Why did could my RMSE have kept increasing as I increased my regularisation strength?
- Does the fact that the coefficients of feat1 and feat3 were pushed to 0 at the same time as the random feature means that they're not good and I should remove them from the model? (It felt strange that they hit 0 at the same time. I was expecting some difference.)
Topic lasso regression machine-learning
Category Data Science