Does PCA helps to include all the variables even if there is high collinearity among variables?

Question

Does PCA helps to include all the variables even if there is high collinearity among variables?

NAS

2022年5月25日 21:01

I have a dataset that has high collinearity among variables. When I created the linear regression model, I could not include more than five variables ( I eliminated the feature whenever VIF5). But I need to have all the variables in the model and find their relative importance. Is there any way around it?. I was thinking about doing PCA and creating models on principal components. Does it help?.

Topic collinearity pca linear-regression

Category Data Science

technik · Accepted Answer · 2022年4月25日 06:06

When using PCA, you should not try to interpret the single features anymore. The principal components are multiple linear combinations of your variables that should not be related to the original features.

When you want to work on feature importance, you can use random forests or decision trees instead, as described before. You can do it with neural networks as well by randomizing or shuffling one feature, re-train the network, and comparing the performance.

Peter · Accepted Answer · 2021年10月30日 21:36

PCA will generate „new“ (transformed) features which are orthogonal (non-correlated). However, since the original features are transformed, you can hardly claim to say a lot about the importance of (original) features based on PCA.

One obvious alternative would be to use a random forest (RF) to determine feature importance. Using tree based models (like RF or tree based boosting) you do not need to care about collinearity in the feature space.

Does PCA helps to include all the variables even if there is high collinearity among variables?

About