cross validation on whole data set or training data?

I am always having cross validation score smaller then the training score and I am performing cross validation on just training data is that normal thing ? Kfold = 5

Topic score cross-validation machine-learning

Category Data Science


Yes, it's called overfitting. Your model is beginning to memorize the training set, but not performing well on any validation or test set. If your question is why is this happening, I'd like to refer you to another answer I wrote explaining this phenomenon in more detail.

One interesting question that could be made is why is the performance on the cross-validation folds worse than on the test set?

This is a bit more tough to answer, because I don't have all the details. Some possible explanations could be that the since the training set is larger than each fold, the model was trained better, or that simply the test set examples happened to be easier.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.