Ideal difference in the training accuracy and testing accuracy

Question

Ideal difference in the training accuracy and testing accuracy

girl101

2022年6月2日 00:06

In a data classification problem (with supervised learning), what should be the ideal difference in the training set accuracy and testing set accuracy? What should be the ideal range? Is a difference of 5% between the accuracy of training and testing set okay? Or does it signify overfitting?

Topic training data supervised-learning accuracy classification

Category Data Science

Neelesh Shukla · Accepted Answer · 2021年8月10日 19:05

A difference of 5% is fine. Try using cross-validation and verify with mean accuracies.

Empirically good settings for performing k-cv k=10 stratify the dataset on the target attribute.

Also please try to see if your dataset is balanced.

Valentin Calomme · Accepted Answer · 2020年5月30日 22:15

Theoretically speaking, in a perfect scenario, training and test data both represent the distribution of your problem accurately. Therefore, in an ideal case, training and testing should not have any significant differences in accuracy. This becomes more and more true when you have lots of data.

A difference of 5% is perfectly fine. In practice, it is common that training accuracy is slightly better than the test accuracy. I will say that the difference may not be the best indicator. What you should look at is correlation. Meaning that, as long as training and testing accuracy improve together at a similar rate, you're in the clear, regardless of how far apart they are. You can investigate that by training and evaluating on increasingly bigger subsets of the data. Ideally, training and testing should both improve as you add data. If test data starts decreasing, you have overfitting.

Ideal difference in the training accuracy and testing accuracy

About