Difference between model score on test part and Kaggle public score

I tested my CatBoostModel model on part of data and get 0.92 score, but Kaggle public score was 0.9. I found new hyperparameters via randomsearch, new model score was 0.925, but on Kaggle score fell to 0.88.

What should I do to validate the model correctly?

Topic catboost validation score kaggle cross-validation

Category Data Science


In general, you should expect to get lower scores on test sets than validation sets, since you took advantage of validation data to tune your model. But for a correctly trained model, the difference between the validation and test sets must be small, as in 0.92 vs 0.9. To be more confident about your model's output, you can perform Cross-Validation.

Also, apparently, your model overfitted the training data after hyperparameter optimization. You can use regularization or early-stopping to prevent that.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.