Formula to calculate confidence value in Adaboost
I am coding an AdaBoostClassifier with the two class variant of SAMME algorithm. Here is the code.
def I(flag):
return 1 if flag else 0
def sign(x):
return abs(x)/x if x!=0 else 1
AdaBoost Class
class AdaBoost:
def __init__(self,n_estimators=50):
self.n_estimators = n_estimators
self.models = [None]*n_estimators
def fit(self,X,y):
X = np.float64(X)
N = len(y)
w = np.array([1/N for i in range(N)])
for m in range(self.n_estimators):
Gm = DecisionTreeClassifier(max_depth=1)\
.fit(X,y,sample_weight=w).predict
errM = sum([w[i]*I(y[i]!=Gm(X[i].reshape(1,-1))) \
for i in range(N)])/sum(w)
'''Confidence Value'''
#BetaM = (1/2)*(np.log((1-errM)/errM))
BetaM = np.log((1-errM)/errM)
w = [w[i]*np.exp(BetaM*I(y[i]!=Gm(X[i].reshape(1,-1))))\
for i in range(N)]
self.models[m] = (BetaM,Gm)
def predict(self,X):
y = 0
for m in range(self.n_estimators):
BetaM,Gm = self.models[m]
y += BetaM*Gm(X)
signA = np.vectorize(sign)
y = np.where(signA(y)==-1,-1,1)
return y
The much I know the formula for confidence is
The much I read, the actual minima occurs when c=1/2 but for any value of c the classifier should produce the same result. But when I am coding the class the output for c = 1 and c = (1/2) are coming different. Moreover if I am not multiplying anything ie. c=1 then the output of my classifier is better and produces identical results with the sklearn implementation of AdaBoost Classifier.
So why multiplying 1/2 is giving bad results?
Topic adaboost scikit-learn classification python
Category Data Science