How do standardization and normalization impact the coefficients of linear models?
One benefit of creating a linear model is that you can look at the coefficients the model learns and interpret them. For example, you can see which features have the most predictive power and which do not.
How, if at all, does feature interpretability change if we normalize (scale all features to 0-1) all our features vs. standardizing (subtract mean and divide by the standard deviation) them all before fitting the model.
I have read elsewhere that you 'lose feature interpretability if you normalize your features' but could not find an explanation as to why. If that is true, could you please explain?
Here are two screenshots of the coefficients for two multiple linear regression models I built. It uses Gapminder 2008 data and statistics about each country to predict its fertility rate.
In the first, I scaled features using StandardScaler. In the second, I used MinMaxScaler. The Region_ features are categorical and were one-hot encoded and not scaled.
Not only did the coefficients change based on different scaling, but their ordering (of importance?) did too! Why is this the case? What does it mean?
Topic interpretation lasso ridge-regression linear-regression feature-scaling
Category Data Science