Difference between PCA and regularisation

Question

Difference between PCA and regularisation

Hang

2021年12月13日 23:04

Currently, I am confusing about PCA and regularisation.

I wonder what is the difference between PCA and regularisation: particularly lasso (L1) regression?

Seems both of them can do the feature selection. I have to admit, I am not quiet familiar with the difference between dimensional reduction and feature selection.

Topic lasso pca regularization

Category Data Science

Peter · Accepted Answer · 2021年12月13日 23:04

Lasso does feature selection in the way that a penalty is added to the OLS loss function (see figure below). So you can say that features with low "impact" will be "shrunken" by the penalty term (you "regulate" the features). Because of the L1 penalty, the $\beta_i$ can become zero (which is not the case with Ridge, L2). In the Lasso case you would "eliminate" a feature when it is "shrunken" to zero, and you could call this feature selection. Lasso can be used in "high dimensions", i.e. when you have many features ("columns") but not so many observations ("rows").

Principle components work in quite a different way. The first principle component is a normalised linear combination [of the original features] which has the largest variance. So you kind of "transform" the original features to a principle component (which is a "new feature" derived from the original ones), where you try to capture as much variance as possible in one principle component.

Principle components are uncorrelated (orthogonal). This can be very helpful when you do linear regression, in which (high) correlation between features can be a real problem. I see PCA as a tool for dimensionality reduction (not so much feature selection), since you can express many features in a (smaller) number of principle components.

So maybe a little too brief summary:

Lasso: "shrink" the estimated coefficients for features which are not too useful (but leaves the features as they are)
PCA: "combine" several features into one or more orthogonal "new" feature(s) (principle components) and use them in some type of model

For more details, refer to "Introduction to Statistical Learning" (available for free online). Chapter 6.2.2 covers the Lasso, chapter 10.2.1 covers PCA.

Difference between PCA and regularisation

About