Why there is very large difference between cross validation scores?

I have a very simple regression model and I am doing the cross validation. When cv=10 the highest score i got is 60.3 and lowest is -9.7 which is useless. Average will be 30.

No of row data set= 658

Topic score data regression cross-validation machine-learning

Category Data Science


Your $R^2$ scores indicate that a linear model does not describe your data well. On top of this, there seems to be a large variability in data. You could try the following:

  • If the linear model is supposed to describe the data, check for outliers. They might be responsible for the large variation across the CV folds.
  • Try reducing the number of features if there are many. The model might be fitting noise.
  • Introducing regularization (lasso or ridge regression) might make the model more robust. This should decrease the variability of the CV errors, but the $R^2$ scores will get even worse.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.