How is a single classifier fitted on AdaBoost?

The AdaBoost algorithm is:

My trouble is how the classifier $G_m(x)$ is trained, What does mean a classifier to be trained using weights $w_i$? Is it to fit classifier through $\{w_i,y_i\}_{i=1}^{N}$?

Topic adaboost boosting classification

Category Data Science


The weights refers to the weights assigned to misclassified samples. Adaboost assigns weights to each misclassified observations based on the calculation showed above. When it trains boosting model this weights help model to give more importance to misclassified observations


Adaboost is a model (ensemble) that starts with high bias but low variance, in contrast with bagging ensembles that are models with a high variance but low variance (see fig 1.)

figure 1

Although the original paper makes usage of decision tree stumps, you could theoretically make use of any other classifier, more precisely of unstable classifier

The process of fitting in AdaBoost happens to be a depth 1 decision tree on a set of data. Given such a set and weights associated with each, the problem of fitting a decision tree (a stump) involves finding the "best" variable $x$ and threshold $s$, where the best variable and split threshold are defined as the pair that minimizes some measure of node impurity, like the Gini index.

So, given a set of candidate variables to split on and a set of training data, there will be a unique solution (a single variable and threshold) that is the best depth 1 decision tree for the current boosting stage. The set of variables $={\{{1,2,…}\}}$ that we can pick our split variable from may be either the entire set of features we have, or it can be a (random) subset. Many implementations of decision tree classifiers will enable the fitting algorithm to randomly pick up a subset of variables at each branching phase.

The reason that we weigh the misclassified points more is that those are the ones we want to correct. A particular variable may not be the best variable to split on given equally weighted data, but it may become the best variable to split on once the weights become unevenly distributed. It could be good at correcting the mistakes of a previous learner.

It is also worth mentioning that isn't guaranteed that each of your features will appear twice in the final result. It could be that several features are repeated, and potentially some are ignored altogether, It may indeed be the case that a variable appears in more than 1 stump during the AdaBoost procedure.

Finally, you could find those resources useful:

  1. https://www.youtube.com/watch?v=thR9ncsyMBE&list=PL05umP7R6ij35ShKLDqccJSDntugY4FQT&index=8

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.