Why performance varies among validate set, public testset and private testset?

When practicing with classical kaggle competitions, such and Titanic, House pricing, and so on, I followed the traditional process that I learned from textbook:

  1. split training data into trainig set and validation set (either by 7:3 or CV
  2. fit model with training set
  3. evaluate the model performance with validation set
  4. combine the training set and validation set and re-train the model with the same parameters that were good on validation set
  5. Predict the result of test set

Something I could not understand is that, why model performed good at validation set, but may not be good at public testset? And sometimes even public testset score may differ from private testset.

If the validation set could not avoid over- or under- fitting in real predition, then it seems useless?

What's more,how can we tell a model is good or not even if it performed good at private test set? Maybe it would performed bad at a private-private testset afterward.

This really frustrated and confused me...maybe I have a wrong concept on performance evaluation method, or there are more reasonalble way to evaluate a model?

Topic kaggle evaluation

Category Data Science


Something I could not understand is that, why model performed good at validation set, but may not be good at public test set? And sometimes even public test set score may differ from private test set.

Validation sets are used to tune your hyperparameters, so that you can then analyze the the performance of your model on a never-seen-before data (the test set). So, comparing the performance of validation set and test sets (private/public) is not fair. Also, in case of Kaggle datasets, usually the public test sets are smaller (and probably a bit different than the private ones), otherwise you could have trained a good model on the public test set and except to get a good result on the private one as well.

If the validation set could not avoid over- or under- fitting in real prediction, then it seems useless?

Again validation sets are used for hyperparameter tuning. If your model cannot perform good (however you define it), use better models or perform a study on your data. But do not use any information you gained from test set to tune the model!

What's more, how can we tell a model is good or not even if it performed good at private test set? Maybe it would performed bad at a "private-private test set" afterward.

Public-private test sets are mainly used for competitions and their main purpose is to prevent cheating. But know this, you were given a limited resources, here data. Perform your best on it. If you are sure about the model you trained, and it still performed poor on the final dataset, maybe the data given to you was not representative enough.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.