Interpreting ROC curves across k-fold cross-validation

Question

Interpreting ROC curves across k-fold cross-validation

PicaR

2022年4月13日 14:11

I have used a MARS model (multivariate adaptive regression splines) and I have used k fold cross validation for the evaluation of the model, obtaining the following graph:

How would be the interpretation of this model? I understand that in the 6 fold, the model obtains a better AUC, but why? What is the interpretation of this? Thanks to all.

Topic interpretation roc cross-validation machine-learning

Category Data Science

Erwan · Accepted Answer · 2022年4月13日 10:58

$k$-fold cross-validation simply repeats the same process with different parts of the data. Therefore any difference between different folds can only be due to chance, i.e. it's only because different instances are selected (by chance) that the results are different.

In theory, if the dataset is large and representative enough, the performance should be almost identical across different folds. Thus large differences tend to indicate that the data is not representative enough and/or that the trained model overfits (i.e. it's too complex for the data, learning detail which happen by chance instead of general patterns).

Imho this graph is a bit subjective to interpret: I would say that there are quite important variations across folds, so it could be a case of overfitting. But the different curves stay roughly around the same area, so I think it's not too bad. It's a borderline case from my point of view.

Interpreting ROC curves across k-fold cross-validation

About