Feature Selection before modeling with Boosting Trees

I have read in some papers that the subset of features chosen for a boosting tree algorithm will make a big difference on the performance

so I've been trying RFE, Boruta, Clustering variables, correlation, WOE IV and Chi-square

Let's say I have a classification problem with over 40 variables, best results after a long long time testing :

  • all variables for Lightgbm (except of one variable with high linearity)

  • I removed correlated variables for Xgboost (around 8 correlated ones)

  • I removed variables based on ElasticNet model for Catboost (around 7 ones)

My question is : what's the proper way to choose the candidates variables for modeling a boosting tree (especially for Lightgbm) ?

I'm using R if there is any suggestion for packages ?

Topic catboost lightgbm xgboost feature-selection r

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.