Lasso regression not getting better without random features

First of all, I'm new to lasso regression, so sorry if this feels stupid.

I'm trying to build a regression model and wanted to use lasso regression for feature selection as I have quite a few features to start with.

I started by standardizing all features and plotting the weights of each feature as I changed my regularisation strength to see which ones are most important. I also plotted the RMSE on the holdout set to find a U-shaped plot, i.e. as I increased the regularisation strength, my RMSE would decrease, then after a certain point, it start to increase. I couldn't. My RMSE plot was non-decreasing.

Then, I decided to throw in a random feature to see how the model would perform and compare my features vs. the random one. My expectation was again to see a U shape. However, that wasn't the case as I got another non-decreasing RMSE plot again which I pasted below. It also looked like two of my features, feat1 and feat3 were just as good as my random feature.

So here are my questions.

  1. Why did could my RMSE have kept increasing as I increased my regularisation strength?
  2. Does the fact that the coefficients of feat1 and feat3 were pushed to 0 at the same time as the random feature means that they're not good and I should remove them from the model? (It felt strange that they hit 0 at the same time. I was expecting some difference.)

Topic lasso regression machine-learning

Category Data Science


In answer to your first question:

The reason that your RMSE proceeded to increase as you increased the strength of your regularization (the value of $\lambda$) can be explained by reviewing the intuition behind what is happening when you increase the regularization of your model.

Why did could my RMSE have kept increasing as I increased my regularization strength?

When you have no penalty (i.e. $\lambda = 0$) the error for your model (RMSE) is inflated significantly. As you penalize the model this error begins to fall. At some point in your modeling, by increasing the value of $\lambda$ too much, you are adding too much bias to the model. At this point, the bias will become too large and your model will begin to under-fit the data. This will result in an increase in the RMSE.

For your model above, this seems to occur somewhere between $\lambda = 10^{-3}$ and $\lambda = 10^{-2}$.

In answer to your second question:

Lasso Regression can be used as a form of feature selection since it shrinks some of the values for the coefficients down to zero. If you increase the value of $\lambda$ significantly, many of the features will be shrunk to zero.

Does the fact that the coefficients of feat1 and feat3 were pushed to 0 at the same time as the random feature mean that they're not good and I should remove them from the model? (It felt strange that they hit 0 at the same time. I was expecting some difference.)

To decide whether you should remove any of these features, you should consider a few things. What is the degree of collinearity that exists between your features. You can check this easily since it looks like your have a small number of features. Also, it is important to consider the number of features you have before regularization. If you have a small number, perhaps you do not want to drop a large amount of them.

Most importantly, you should examine the change in your RMSE error associated with each increase in $\lambda$ and choose your $\lambda$ that minimizes the RMSE. If certain of the features are pushed to zero based on the optimal choice of $\lambda$ then maybe they should be removed. The rate at which they move towards zero (both at the same time) does not necessarily suggest that they should be removed, only that they might be candidate features to remove.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.