AdaBoost implementation and tuning for high dimensional feature space in R

Question

AdaBoost implementation and tuning for high dimensional feature space in R

AfBM

2020年12月1日 09:18

I am trying to implement the AdaBoost.M1 algorithm (trees as base-learners) to a data set with a large feature space (~ 20.000 features) and ~ 100 samples in R. There exists a variety of different packages for this purpose; AdaBag, Ada and gbm. gbm() (from the gbm-package) appears to be my only available option, as stack.overflow is a problem in the others, and though it works, it is very time-consuming.

Questions:

Is there any way to overcome the stack.overflow the problem in the other packages, or have the gbm() run faster? I have tried converting the data.frame into a matrix without success.
When performing AdaBoost in gbm() (with distribution set to AdaBoost), An Introduction to Statistical Learning (Hastie et al.) mentions the following parameters needed for tuning:

Needs:

The total number of trees to fit.
The shrinkage parameter denoted lambda.
The number of splits in each tree, controlling the complexity of the boosted ensemble.

As the algorithm is very time consuming to perform in R, I need to find literature on what tuning parameters are within a suitable range for this kind of large feature space data, before performing cross-validation over the range to estimate the test error rate.

Any suggestions?

Topic adaboost boosting gbm r machine-learning

Category Data Science

AdaBoost implementation and tuning for high dimensional feature space in R

About