What is the best way to use Early Stopping in an ensemble (stacking) model?
I have a training and a test dataset. I would like to use the output of Model A in an ensemble model. However, I would like to use early stopping.
Usually, I would create Model A for each K-fold (on training) and predict the OOF to create the meta-model dataset. Then I would repeat the methodology but for hyperparameter tuning the 2nd layer of the stacking model, trained with the meta-model dataset.
Finally, I train the meta models on the whole training set, train the 2nd layer of the stacking model on the whole training set, and predict\evaluate the results for the test set.
How would the final step work if I am using early-stopping? should I just create the meta-model dataset without CV and without predicting the OOF? or should I forgo early stopping and find the right hyper-parameters\training steps.
Topic ensemble ensemble-modeling cross-validation machine-learning
Category Data Science