Chi-square as evaluation metrics for nonlinear machine learning regression models

I am using machine learning models to predict an ordinal variable (values: 1,2,3,4, and 5) using 7 different features. I posed this as a regression problem, so the final outputs of a model are continuous variables. So an evaluation box plot looks like this:

I experiment with both linear (linear regression, linear SVMs) and nonlinear models (SVMs with RBF, Random forest, Gradient boosting machines ). The models are trained using cross-validation (~1600 samples), and 25% of the dataset is used for testing (~540 samples). I am using R-squared and Root Mean Square Error (RSME) to evaluate the models on test samples. I am interested in finding an evaluation measure to compare linear models to nonlinear ones.

This is done for scientific research. It was pointed out that R-square might not be an appropriate measure for nonlinear models, and that the Chi-Square test would be a better measure for goodness of fit.

The problem is, I am not sure what is the best way to do it. When I browse Chi-square as the goodness of fit, I only get examples where the Chi-square test is used to see whether some categorical samples fit a theoretical expectation, such as here. So here are my considerations/questions:

  1. One way I could think of is to categorize predicted (continuous) values into bins, and compare predicted distribution to the ground truth distribution using the Chi-Square test. But that doesn't make much sense, i.e. we have a machine learning model that perfectly predicts ground truth values 2,3, and 4, and values 5 predicts as 1, and values 1 as 5 - Chi-Square test that I propose here would reject the null hypothesis, although the model is mispredicting 2 out of 5 values.

  2. As referred to in a tutorial from USC I could use formula (1) to compute Chi-Square value, where experimentally measured quantities (xi) are my ground truth values, and hypothesized values (mui) are my predicted values. My question is, what is the variance? If we observe each value 1,2,3,4, and 5 as a distinct category, then the variance of ground truth within each category is equaled to zero. Also, how one computes the degree of freedom (N-r)?

  3. Related to the statement I am interested in finding an evaluation measure to compare linear models to nonlinear is the Chi-Square test the best (or even good) choice? What I've seen so far in machine learning competitions for regression tasks, either MSE or RSME are used for evaluation.

Topic metric machine-learning-model evaluation

Category Data Science


You can use chi-square in two ways: 1) For each separate model you can classify into the actual number of predicted correctly or incorrectly vs. expected number of predicted correctly or incorrectly. 2) you can compare multiple models together under the hypothesis that there is no difference by expanding the matrix to include the model as a row along with the number of correct/incorrect, actual vs. expected. There is also a ranked chi square test which applies to ordered ranks as you are suggesting, but the concern would be that there would be a small cell count for some cells.


You should frame your problem as ordinal regression. Then the model would predict target value, one of the five integer values.

As a result, the evaluation would not be best done with Root Mean Square Error (RSME). Chi-Square test could be applied between expected and predict counts for each of the five value levels.

If you want to then add in other model types, find the ordinal analogs (ordinal SVM or ordinal decision tree). The same Chi-Square test based on counts can be applied to find the best model.


Use your test data to compare the predictive performance of each model.

In R you could do this like:

linear.predictions <- predict(linear.model, newdata = test.data)
nonlinear.predictions <- predict(nonlinear.model, newdata = test.data)

linear.percent.difference <- (test.data$TARGET_VARIABLE -
                              linear.predictions)  /
                              test.data$TARGET_VARIABLE

nonlinear.percent.difference <- (test.$TARGET_VARIABLE -
                                 nonlinear.predictions) /
                                 test.dtat$TARGET_VARIABLE

linear.grade <- mean(linear.percent.difference)
nonlinear.grade <- mean(nonlinear.percent.difference)

This is a pretty simple way to do it, but it is one that works for me and is easy to understand, especially if your audience is going to eye-glaze as soon as you say "Chi-square..." Get creative!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.