Feature Selection before modeling with Boosting Trees

Question

Feature Selection before modeling with Boosting Trees

Mamoud

2021年3月26日 15:16

I have read in some papers that the subset of features chosen for a boosting tree algorithm will make a big difference on the performance

so I've been trying RFE, Boruta, Clustering variables, correlation, WOE IV and Chi-square

Let's say I have a classification problem with over 40 variables, best results after a long long time testing :

all variables for Lightgbm (except of one variable with high linearity)
I removed correlated variables for Xgboost (around 8 correlated ones)
I removed variables based on ElasticNet model for Catboost (around 7 ones)

My question is : what's the proper way to choose the candidates variables for modeling a boosting tree (especially for Lightgbm) ?

I'm using R if there is any suggestion for packages ?

Topic catboost lightgbm xgboost feature-selection r

Category Data Science

Feature Selection before modeling with Boosting Trees

About