Formula to calculate confidence value in Adaboost

I am coding an AdaBoostClassifier with the two class variant of SAMME algorithm. Here is the code.

def I(flag):
    return 1 if flag else 0
def sign(x):
    return abs(x)/x if x!=0 else 1   

AdaBoost Class

class AdaBoost:

    def __init__(self,n_estimators=50):
        self.n_estimators = n_estimators
        self.models = [None]*n_estimators

    def fit(self,X,y):

        X = np.float64(X)
        N = len(y)
        w = np.array([1/N for i in range(N)])

        for m in range(self.n_estimators):

            Gm = DecisionTreeClassifier(max_depth=1)\
                        .fit(X,y,sample_weight=w).predict

            errM = sum([w[i]*I(y[i]!=Gm(X[i].reshape(1,-1))) \
                        for i in range(N)])/sum(w)

            '''Confidence Value'''
            #BetaM = (1/2)*(np.log((1-errM)/errM))
            BetaM = np.log((1-errM)/errM)

            w = [w[i]*np.exp(BetaM*I(y[i]!=Gm(X[i].reshape(1,-1))))\
                     for i in range(N)] 


            self.models[m] = (BetaM,Gm)

    def predict(self,X):

        y = 0
        for m in range(self.n_estimators):
            BetaM,Gm = self.models[m]
            y += BetaM*Gm(X)
        signA = np.vectorize(sign)
        y = np.where(signA(y)==-1,-1,1)
        return y

The much I know the formula for confidence is

The much I read, the actual minima occurs when c=1/2 but for any value of c the classifier should produce the same result. But when I am coding the class the output for c = 1 and c = (1/2) are coming different. Moreover if I am not multiplying anything ie. c=1 then the output of my classifier is better and produces identical results with the sklearn implementation of AdaBoost Classifier.

So why multiplying 1/2 is giving bad results?

Topic adaboost scikit-learn classification python

Category Data Science


Actually this equation is not quite right.

enter image description here

The actual minima occurs at

enter image description here

In the derivation this BetaM is the solution to the equation

enter image description here

Notice how in the function we are scaling the weights of wrong predictions up by exp(BetaM) and scaling down the weights of correct predictions by multiplying exp(-BetaM).

But in the code that I have written (also in the sklearn implementation) only the wrong predictions are being scaled up. So to get the relative scaling correct we have to scale the wrong predictions by exp(2BetaM) i.e.

enter image description here

And as of the hypothesis function has the form

enter image description here

Any scalar multiple of BetaM will work fine as all the predictions will be scaled by the same amount. So for convenience in coding BetaM is written as just

enter image description here

Note: You can perfectly use BetaM = 0.5 * log((1/errM)/errM) . But then you have to scale down the correct predictions too. If you do that then the code will give correct results.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.