AdaBoost.R2 learning rate from scikit learn
AdaBoost.R2 (regression), is presented in the paper improving regressors with boosting techniques from Drucker and is freely available on Scholar.
The implementation of regression for AdaBoost in scikit learn uses this algorithm (paper is cited in the sources of the AdaboostRegressor class).
The thing is that there is a step fundamentaly different from the original version of Drucker. There is the introduction of a new parameter named 'learning rate' for the AdaBoost algorithm. I will use $\eta$ as notation for this parameter.
Looking at the code, I was able to see that the step calculating the model importance :
$\beta=\frac{\hat{L}}{1-\hat{L}}$
becomes :
$\alpha=\eta ln(\frac{1}{\beta})$
where $\alpha$ is the importance used in the implementation.
The modification of the weights also have been changed from :
$w_{i+1}=w_{i}\beta^{[1-L_{i}]}$
to :
$w_{i+1}=w_{i}\beta^{\eta[1-L_{i}]}$
This parameter is used to control the amound the weights are changed at each iteration, and also the model importance. The effect of this is that if you set a learning rate inferior to 1, then more models will be built before reaching an average loss of 0.5 (hence stopping the algorithm).
However I don't understand how this works mathematically and if someone kind could explain it a bit more it would be really great. I also would love to know especially if setting a learning rate of 1 is equivalent to the original version of the algorithm.
Topic adaboost boosting mathematics
Category Data Science