Why does log-transforming the target have a huge impact on MSE value?

Question

Why does log-transforming the target have a huge impact on MSE value?

Caterina

2022年5月4日 18:56

I am doing linear regression using the Boston Housing data set, and the effect of applying $\log(y)$ has a huge impact on the MSE. Failing to do it gives MSE=34.94 while if $y$ is transformed, it gives 0.05.

Topic transformation rmse mse feature-scaling

Category Data Science

Adam · Accepted Answer · 2022年5月2日 22:56

The MSE is sensitive to scale. To see this, $$ MSE = \frac{1}{N}\sum_{i=1}^{N} (y_i - \hat{y}_i)^2 $$ Let's suppose your outcome ranges from $[1,99]$ with mean at $50$, and let's pretend your model is just a "naive" estimate where the estimates are just $\hat{y}_i = 50$. The MSE is then 816.66.

Now if you log-transformed, the outcome ranges from $[0,4.595]$ with mean 3.63. Again we use a simple model where the estimates are just the sample mean. The MSE is then 0.851.

Note that the fit of the model is not any better, the only thing that's changed is the scale of the MSE.

Why does log-transforming the target have a huge impact on MSE value?

About