My own model trained on the full data is better than the best_estimator I get from GridSearchCV with refit=True?
I am using an XGBoost model to classify some data. I have cv splits (train, val) and a separate test set that I never use until the end.
I have used GridSearchCV to determine the best parameters and fed my cv splits (5 folds) into it as well as set refit=True so that once it figures out the best hyperparameters it trains on the full data (all folds as opposed to just 4/5 folds) and returns the best_estimator. I then test this best model on my test set at the end.
I then compare the results of this model with a model that I train on my own separately with the best hyperparameters, and I get better results with my own model. Why is that?
Does GridSearchCV still use cross validation when it's training on the full data with the best hyperparameters? Is it the case that GridSearchCV is doing something extra that's hurting the model?
Topic hyperparameter-tuning data-science-model gridsearchcv xgboost
Category Data Science