AdaBoost implementation and tuning for high dimensional feature space in R
I am trying to implement the AdaBoost.M1 algorithm (trees as base-learners) to a data set with a large feature space (~ 20.000 features) and ~ 100 samples in R
. There exists a variety of different packages for this purpose; AdaBag
, Ada
and gbm
. gbm()
(from the gbm
-package) appears to be my only available option, as stack.overflow
is a problem in the others, and though it works, it is very time-consuming.
Questions:
- Is there any way to overcome the
stack.overflow
the problem in the other packages, or have thegbm()
run faster? I have tried converting thedata.frame
into a matrix without success. - When performing
AdaBoost
ingbm()
(with distribution set to AdaBoost), An Introduction to Statistical Learning (Hastie et al.) mentions the following parameters needed for tuning:
Needs:
- The total number of trees to fit.
- The shrinkage parameter denoted lambda.
- The number of splits in each tree, controlling the complexity of the boosted ensemble.
As the algorithm is very time consuming to perform in R
, I need to find literature on what tuning parameters are within a suitable range for this kind of large feature space data, before performing cross-validation over the range to estimate the test error rate.
Any suggestions?
Topic adaboost boosting gbm r machine-learning
Category Data Science