Formula to calculate confidence value in Adaboost

Question

Formula to calculate confidence value in Adaboost

samarendra chandan bindu Dash

2020年3月30日 05:40

I am coding an AdaBoostClassifier with the two class variant of SAMME algorithm. Here is the code.

def I(flag):
    return 1 if flag else 0

def sign(x):
    return abs(x)/x if x!=0 else 1

AdaBoost Class

class AdaBoost:

    def __init__(self,n_estimators=50):
        self.n_estimators = n_estimators
        self.models = [None]*n_estimators

    def fit(self,X,y):

        X = np.float64(X)
        N = len(y)
        w = np.array([1/N for i in range(N)])

        for m in range(self.n_estimators):

            Gm = DecisionTreeClassifier(max_depth=1)\
                        .fit(X,y,sample_weight=w).predict

            errM = sum([w[i]*I(y[i]!=Gm(X[i].reshape(1,-1))) \
                        for i in range(N)])/sum(w)

            '''Confidence Value'''
            #BetaM = (1/2)*(np.log((1-errM)/errM))
            BetaM = np.log((1-errM)/errM)

            w = [w[i]*np.exp(BetaM*I(y[i]!=Gm(X[i].reshape(1,-1))))\
                     for i in range(N)] 


            self.models[m] = (BetaM,Gm)

    def predict(self,X):

        y = 0
        for m in range(self.n_estimators):
            BetaM,Gm = self.models[m]
            y += BetaM*Gm(X)
        signA = np.vectorize(sign)
        y = np.where(signA(y)==-1,-1,1)
        return y

The much I know the formula for confidence is

The much I read, the actual minima occurs when c=1/2 but for any value of c the classifier should produce the same result. But when I am coding the class the output for c = 1 and c = (1/2) are coming different. Moreover if I am not multiplying anything ie. c=1 then the output of my classifier is better and produces identical results with the sklearn implementation of AdaBoost Classifier.

So why multiplying 1/2 is giving bad results?

Topic adaboost scikit-learn classification python

Category Data Science

samarendra chandan bindu Dash · Accepted Answer · 2020年3月30日 05:40

Actually this equation is not quite right.

The actual minima occurs at

In the derivation this BetaM is the solution to the equation

Notice how in the function we are scaling the weights of wrong predictions up by exp(BetaM) and scaling down the weights of correct predictions by multiplying exp(-BetaM).

But in the code that I have written (also in the sklearn implementation) only the wrong predictions are being scaled up. So to get the relative scaling correct we have to scale the wrong predictions by exp(2BetaM) i.e.

And as of the hypothesis function has the form

Any scalar multiple of BetaM will work fine as all the predictions will be scaled by the same amount. So for convenience in coding BetaM is written as just

Note: You can perfectly use BetaM = 0.5 * log((1/errM)/errM) . But then you have to scale down the correct predictions too. If you do that then the code will give correct results.

Formula to calculate confidence value in Adaboost

About