Gridsearch XGBoost for ensemble. Do I include first-level prediction matrix of base learners in train set?

I'm not quite sure how I should go about tuning xgboost before I use it as a meta-learner in ensemble learning.

Should I include the prediction matrix (ie. df containing columns of prediction results from the various base learners) or should I just include the original features?

I have tried both methods with just the 'n_estimators' tuned with F1 score as the metric for cross-validation. (learning rate =0.1)

Method 1: With pred matrix + original features:

n_estimators = 1 (this means only one tree is included in the model, is this abnormal? )
F1 Score (Train): 0.907975 (suggest overfitting)

Method 2: With original features only:

n_estimators = 1
F1 Score (Train): 0.39

I am getting rather different results for both methods, which makes sense as the feature importance plot for Method 1 shows that one of the first-level predictions is the most important.

I think that the first-level predictions by the base-learners should be included in the gridsearch. Any thoughts?

Topic ensemble xgboost scikit-learn python

Category Data Science


Yes, you can definitely use first base model predictions as inputs to meta learning. It can improve models and has been used a lot in competitive platforms.

This technique is known as stacking technique and is prone to over fitting. If you want to do stacking for your models the i would suggest to use out of sample cross validation score and see the performance.

You should try to follow this nelow appraoch it helps in limiting overfitting (read overfitting can still happend but chances are reduced with good design) : enter image description here

https://developer.ibm.com/articles/stack-machine-learning-models-get-better-results/


You should tune the meta-estimator using whatever data you want it to eventually predict with. This should definitely include the base model predictions (else you aren't actually ensembling), and may or may not include (some of) the original features.

One important note though: you should not be training the meta-estimator using "predictions" of the base models on their own training data; those are more accurately called estimations rather than predictions, because the base models already had access to the truth. A common approach is to train the meta-estimator on out-of-fold predictions from a cross-validation training of the base models.

If the base models are quite good, then it's reasonable that the xgboost model might only use one tree; it just has to tweak the already-good predictions from the base models. But, consider dropping the learning rate or otherwise increasing regularization, to see if more trees can perform better.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.