Possible harm in standardizing one-hot encoded features
While there may not be any added value in standardizing one-hot encoded features prior to applying linear models, is there is any harm in doing so (i.e., affecting model performance)?
Standardizing definition: applying (x - mean) / std to make the feature mean and std 0, 1 respectively)
I prefer applying standardization to my entire training dataset after one-hot encoding, rather than applying it only to the numerical features. I feel it would significantly simplify my pipeline.
For example, if I have a binary feature then the vector that will be provided to the model is [1,1,0,0,0,1,1].
If standardization is applied to this binary feature prior to fitting the model (subtract mean = ~0.57 and divide by std = ~ 0.49), the vector will become
[ 0.8660254 , 0.8660254 , -1.15470054, -1.15470054, -1.15470054, 0.8660254 , 0.8660254 ]
Topic collinearity pipelines one-hot-encoding linear-regression
Category Data Science