LightGBM eval_set - what to do when I fit the final model (there's no test data left)
I'm using LightGBM's eval_set feature when fitting my model. This enables early stopping on the number of estimators used.
callbacks = [lgb.early_stopping(80, verbose=0), lgb.log_evaluation(period=0)]
fit_params = {callbacks:callbacks, eval_metric : auc, eval_set : [(x_train,y_train), (x_test,y_test)], eval_names : ['train', 'valid']}
lg = LGBMClassifier(n_estimators=5000, verbose=-1,objective=binary, **{scale_pos_weight:train_weight, metric:auc})#binary_logloss})
This works great when doing cross validation and early stopping is triggered.
But when I have finally selected a model, and want to train it on the full data set. I have no test data left to trigger early stopping?
What's the accepted practise here? Can I use the holdout data?
Or shall I keep another set of data purely for the eval_set?
EDIT:
Come to think of it, is there data leakage if in a cross validation I pass my test data to eval_set
? Am I doing this all wrong?
Topic lightgbm
Category Data Science