Difference between scaling just x or x and y in PCA / principle component regresseion

Question

Difference between scaling just x or x and y in PCA / principle component regresseion

Sally

2022年3月30日 20:43

Before doing principle component regression it is important to scale the data. But which data exactly? Is it enough if I just scale X or do I have to scale the whole data set, containing X and Y (=regressor and regressand)? The advantage of scaling just X is, that I do not have to backtransform Y. But is this valid? Whats the difference between scaling just X and scaling the whole data set?

Topic inference pca regression dataset statistics

Category Data Science

PicaR · Accepted Answer · 2022年3月30日 20:43

The PCA process identifies the directions with the greatest variance. Since the variance of a variable is measured in its own units squared, if all the variables are not standardized to have zero mean and standard deviation of one before calculating the components, those variables whose scale is larger will dominate the rest. Therefore, it is not necessary to scale the variable Y, simply scale the variables of the matrix X so that all the vectors have the "same distance".

Difference between scaling just x or x and y in PCA / principle component regresseion

About