Difference between scaling just x or x and y in PCA / principle component regresseion

Before doing principle component regression it is important to scale the data. But which data exactly? Is it enough if I just scale X or do I have to scale the whole data set, containing X and Y (=regressor and regressand)? The advantage of scaling just X is, that I do not have to backtransform Y. But is this valid? Whats the difference between scaling just X and scaling the whole data set?

Topic inference pca regression dataset statistics

Category Data Science


The PCA process identifies the directions with the greatest variance. Since the variance of a variable is measured in its own units squared, if all the variables are not standardized to have zero mean and standard deviation of one before calculating the components, those variables whose scale is larger will dominate the rest. Therefore, it is not necessary to scale the variable Y, simply scale the variables of the matrix X so that all the vectors have the "same distance".

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.