Adaboost - Show that adjusting weights brings error of current iteration to 0.5

I'm trying to solve the following problem but I've gotten sort of stuck.

So for adaboost, $err_t = \frac{\sum_{i=1}^{N}w_i \Pi (h_t(x^{(i)}) \neq t^{(i)})}{\sum_{i=1}^{N}w_i}$

and $\alpha_t = \frac{1}{2}ln(\frac{1-err_t}{err_t})$

Weights for the next iteration are $w_i' = w_i exp(-\alpha_t t^{(i)} h_t(x^{(i)}))$ and this assumes $t$ and $h_t$ takes on a value of either $-1$ or $+1$.

I have to show that the error with respect to the new weights $w_i'$ is $\frac{1}{2}$. i.e., $err_t' = \frac{\sum_{i=1}^{N}w_i' \Pi (h_t(x^{(i)}) \neq t^{(i)})}{\sum_{i=1}^{N}w_i'} = \frac{1}{2}$

i.e., we use the weak learner of iteration t and evaluate it according to the new weights, which will be used to learn the $t+1$-st weak learner.

I simplified it so that $w_i'=w_i \sqrt{\frac{err_t}{1-err_t}}$ if $w_i$ was correctly classified and $w_i'=w_i \sqrt{\frac{1-err_t}{err_t}}$ if $w_i$ was incorrectly classified. I then tried plugging this into the equation for $err_t'=\frac{1}{2}$ and got $\frac{err_t}{1-err_t} \frac{\sum_{i=1}^{N}w_i \Pi (h_t(x^{(i)}) = t^{(i)})}{\sum_{i=1}^{N}w_i \Pi (h_t(x^{(i)}) \neq t^{(i)})} = 1$ but at this point I sort of ran into a dead end and so I'm wondering how one might show the original question.

Thanks for any help!

Topic boosting algorithms machine-learning

Category Data Science


For simplicity, lets define some variables as follows:

$W_C := \sum_{i=1}^{N}w_i \Pi (h_t(x^{(i)}) = t^{(i)})$

$W_I := \sum_{i=1}^{N}w_i \Pi (h_t(x^{(i)}) \neq t^{(i)})$

Therefore, $err_t = W_I/(W_C+W_I)$

$a := \sqrt{\frac{err_t}{1-err_t}} = \sqrt{\frac{W_I}{W_C}}$

Now, for new weights we have

$W'_C := \sum_{i=1}^{N}w'_i \Pi (h_t(x^{(i)}) = t^{(i)}) = aW_C$

$W'_I := \sum_{i=1}^{N}w'_i \Pi (h_t(x^{(i)}) \neq t^{(i)}) = (1/a)W_I$

Now, as the final step:

$err'_t = \frac{W'_I}{W'_I + W'_C} = \frac{(1/a)W_I}{(1/a)W_I + aW_C} \overset{\times a}{=} \frac{W_I}{W_I + a^2W_C} = \frac{W_I}{W_I + \frac{W_I}{W_C}W_C}=\frac{1}{2}$


You're nearly there. The quantity $err_t/(1-err_t)$ is exactly what you need it to be. It might be easier to see if you think about $$\sum_{i=1}^N w_i \Pi(h_t(x^{(i)})=t^{(i)})$$ as $$\sum_{i:\ x^{(i)}\text{ is correctly classified}} w_i$$ (just using the indicator function to reduce the summation range).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.