Evaluation of regression models with different evaluations (MSE, variance, VAF etc.)

When comparing several regression models in terms of quality, it seems like most have agreed on the MSE. There are also papers comparing variance and variance accounted for (VAF). However, there seems to be a controversial opinion about the variance (R^2). Should it nevertheless be compared in a scientific paper?

$$ VAF_i = \bigg[ 1-\frac{\text{var}\big(y_i - \hat y_i\big)}{\text{var}\big(y_i\big)} \bigg] \times 100\% $$

And what does VAF say? Is the VAF still a good measure of regression models?

Topic variance regression evaluation machine-learning

Category Data Science


$R^2$ has a nice interpretation as the proportion of variance explained, except when it does not in nonlinear models like many machine learning staples are. However, $R^2$ is, in some sense, equivalent to square loss, so if you are comfortable evaluating your model using a measure of square loss like $MSE = \frac{1}{n}\sum_i(y_i-\hat y_i)^2$ or $RMSE = \sqrt{MSE}$, you should be comfortable using $R^2$.

A common criticism of $R^2$ is that it can be driven arbitrarily high, all the way up to $1$. It is false that $R^2$ can be driven up to $1$ in every case, since two observations can have the same features but a different $y$ value, but $R^2$ certainly can be driven quite high just by including more and more features, whether they are related to the outcome or not.

However, the same is true of $MSE$. Just look at the equation.

$$ R^2 = 1 - \dfrac{\sum_i \big(y_i - \hat y_i\big)^2} {\sum_i \big(y_i - \bar y\big)^2}\\ = 1 - \dfrac{n \times MSE} {\sum_i \big(y_i - \bar y\big)^2}\\ = 1 - \dfrac{n \times \big(RMSE\big)^2} {\sum_i \big(y_i - \bar y\big)^2} $$

Consequently, any criticism of $R^2$ also has to apply to $MSE$ and $RMSE$.

However, it is completely valid to evaluate $R^2$, $MSE$, and $RMSE$ with out-of-sample data. Then the criticism that you can drive performance high by including unrelated features does not apply; your out-of-sample performance will take a hit when you include too many unrelated features. The out-of-sample equations for $MSE$ and $RMSE$ are exactly the same as for in-sample, but $R^2$ is tweaked.

$$ R^2_{out} = 1 - \dfrac{\sum_i \big(y_i - \hat y_i\big)^2} {\sum_i \big(y_i - \bar y_{in}\big)^2} $$

You use the in-sample mean to stay with the spirit of evaluating your performance, compared to the naïve model that always guesses the observed mean.

Even in the OLS linear regression case, however, $R^2_{out}$ lacks the "proportion of variance explained" interpretation. Let's see an example where that $Other$ term in the linked post on Cross Validated is not equal to zero.

set.seed(2021)
N <- 100
other <- function(y, preds, ybar){
  
  return(
  
    sum(
      (y - preds)
      *
      (preds - ybar)
    )  
  )
}
x_all <- runif(N)
y_all <- 7*x_all + rnorm(N)
x_in <- x_all[1:80]
x_out <- x_all[81:100]
y_in <- y_all[1:80]
y_out <- y_all[81:100]
L <- lm(y_in ~ x_in)
preds_in <- predict(L, data.frame(x_in = x_in))
preds_out <- predict(L, data.frame(x_in = x_out))
other_in <- other(y_in, preds_in, mean(y_in))
other_out <- other(y_out, preds_out, mean(y_in))
other_in
other_out

I get an in-sample $Other = 1.3\times 10^{-13}$, which is close enough to $0$ for arithmetic done on a computer, but out-of-sample $Other = -3.3$.


Variability accounted for (VAF) is not meaningful in regression.

VAF can be driven higher by the distribution of the independent variables which does tell you nothing about how good a regression model is.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.