Why performance varies among validate set, public testset and private testset?
When practicing with classical kaggle competitions, such and Titanic, House pricing, and so on, I followed the traditional process that I learned from textbook:
- split training data into trainig set and validation set (either by 7:3 or CV
- fit model with training set
- evaluate the model performance with validation set
- combine the training set and validation set and re-train the model with the same parameters that were good on validation set
- Predict the result of test set
Something I could not understand is that, why model performed good at validation set, but may not be good at public testset? And sometimes even public testset score may differ from private testset.
If the validation set could not avoid over- or under- fitting in real predition, then it seems useless?
What's more,how can we tell a model is good or not even if it performed good at private test set? Maybe it would performed bad at a private-private testset afterward.
This really frustrated and confused me...maybe I have a wrong concept on performance evaluation method, or there are more reasonalble way to evaluate a model?
Topic kaggle evaluation
Category Data Science