Updating weights in Adaboost

I'm studying the Adaboost algorithm. This algorithm updates the weight after training.

This is the table when they explain about weight on Adaboost

I'm confused about what this "weight" means. Is it:

  1. weight for each node?
  2. weight for each model?
  3. table weight column (more confusing)?

Topic boosting

Category Data Science


AdaBoost is a binary classifier (it can be easily extended to more classes but formulas are a bit different). AdaBoost builds classification trees in an additive way.

Weights are assigned to each instance/observation from the training data set. So $w_i$ is the weight of the observation $i$. Initially, all weights are equal, all are $\frac{1}{M}$ where $M$ is the number of observations.

The trees built with AdaBoost uses those weights for each observation to compute everything in nodes (split test, prediction, score). The main difference from a normal classification tree is that instead of using counts you have to use weights. A small example: consider that in a node at some point you have 3 observations labeled positive and 2 observations labeled negative. You would compute entropy, for example, using $2/5$ and $3/5$ ratios for probabilities. In a tree used in AdaBoost you would have to use weights of each observation, so you would have $w_1,w_2,w_3$ for positive labeled observations and $w_4,w_5$ for the negative ones. The entropy would use $(w_1+w_2+w_3)/\sum_i w_i$ and $(w_4+w_5)/\sum_i w_i$.

Why use weights instead of counts (which by the way, the counts are equivalent to using weights when all weights are $1$)? Because after each tree is built you increase the weights of those observations which are not predicted well and decrease those that were well predicted. The idea is because the AdaBoost is an additive model, once you learned something you want to draw attention to what still needs to be learned.

I mentioned that each tree is added to the whole learner. However not each tree brings the same contribution, not all trees are equal in performance. Because of that, you want to ponder how much influence a tree will have on the final result. Hence you have a computed factor $\alpha_m$ which models the contribution of each tree.

There are no weights on tree nodes, there cannot be.

The papers that establish AdaBoost and AdaBoost.SAMME (version for multi classes) does not specify any weights on columns. If you want that you can do a feature engineering before, I suppose, but weights on columns do not fit in the math of AdaBoost without some pretty complicated adjustments.

The table you published specifies observations on rows and features and target variables on columns. The last column contains the initial weights of each observation.

[Later edit]

When I studied this algorithm for the purpose of implementation I have read all the available papers on it (I have them somewhere but they are printed and not at hand). However, I remember the book of Hastie,Tibshirani,Friedman - The Elements of Statistical Learning have a really good description and you can find the book online since it was made available by the authors. I advise you to spend some time there.


weight for each node? $w_i$

weight for each model? $\alpha_i$

table weight column (more confusing)? The initial weight is set as $\frac{1}{M}$ where M is the number of data points. I assume the data2_1 is the update of data1_1 after being updated by a new weight for feature1 but it's a bit obscure due to lack of description about the title, etc. of the table.

weight for each node is being updated after a classifier is trained on the data and it's weight ($\alpha_i$) and the score is calculated.

$T(x_i)$ is the final classifier but seemingly is lacks a sign before the expression. which is the sum of multiplication of each classifier with it's weight $\alpha_i$

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.