Is the way to combine weak learners in AdaBoost for regression arbitrary?

Question

Is the way to combine weak learners in AdaBoost for regression arbitrary?

Akira

2020年10月1日 02:34

I'm reading about how variants of boosting combine weak learners into final predication. The case I'm consider is regression.

In paper Improving Regressors using Boosting Techniques, the final prediction is the weighted median.

For a particular input $x_{i},$ each of the $\mathrm{T}$ machines makes a prediction $h_{t}, t=1, \ldots, T .$ Obtain the cumulative prediction $h_{f}$ using the T predictors: $$h_{f}=\inf\left\{y \in Y: \sum_{t: h_{t} \leq y} \log \left(1 / \beta_{t}\right) \geq \frac{1}{2} \sum_{t} \log \left(1 / \beta_{t}\right)\right\}$$ This is the weighted median. Equivalently, each machine $h_{t}$ has a prediction $y_{i}^{(t)}$ on the $i$'th pattern and an relabeled such that for pattern $i$ we have: $$ y_{i}^{(1)}y_{i}^{(2)}, \ldots,y_{i}^{(T)} $$ (retain the association of the $\beta_{t}$ with its $y_{i}^{(t)}$). Then sum the $\log \left(1 / \beta_{t}\right)$ until we reach the smallest $t$ so that the inequality is satisfied. The prediction from that machine $\mathrm{T}$ we take as the ensemble prediction. If the $\beta_{t}$ were all equal, this would be the median.

An Introduction to Statistical Learning: with Applications in R: The final prediction is the weighted average.

As such, I would like to ask of the way of aggregation is mathematics-based, or because the researcher feels it's reasonable.

Thank you so much!

Topic adaboost boosting

Category Data Science

Ben Reiniger · Accepted Answer · 2020年10月1日 02:34

The ISL description is of gradient boosting (regression, with mse as the loss function), not of AdaBoost. There, $\lambda$ is constant, not weights for each tree. Since each tree is fitted to the residuals, we need to add the results to better approximate the true values, not average.

However, the title question is still an interesting one. It does seem probably mostly arbitrary, but at least some testing has been done, see e.g. "Experiments with AdaBoost.RT, an Improved Boosting Scheme for Regression" by Shrestha and Solomatine.

Is the way to combine weak learners in AdaBoost for regression arbitrary?

About