Gridsearch XGBoost for ensemble. Do I include first-level prediction matrix of base learners in train set?
I'm not quite sure how I should go about tuning xgboost before I use it as a meta-learner in ensemble learning.
Should I include the prediction matrix (ie. df containing columns of prediction results from the various base learners) or should I just include the original features?
I have tried both methods with just the 'n_estimators' tuned with F1 score as the metric for cross-validation. (learning rate =0.1)
Method 1: With pred matrix + original features:
n_estimators = 1 (this means only one tree is included in the model, is this abnormal? )
F1 Score (Train): 0.907975 (suggest overfitting)
Method 2: With original features only:
n_estimators = 1
F1 Score (Train): 0.39
I am getting rather different results for both methods, which makes sense as the feature importance plot for Method 1 shows that one of the first-level predictions is the most important.
I think that the first-level predictions by the base-learners should be included in the gridsearch. Any thoughts?
Topic ensemble xgboost scikit-learn python
Category Data Science