Scikit-learn make_scorer custom metric problem for multiclass clasification

I was doing a churn analysis using:

randomcv = RandomizedSearchCV(estimator=clf,param_distributions = params_grid,
                          cv=kfoldcv,n_iter=100, n_jobs=-1, scoring='roc_auc')

and everything was fine, but then, I tried it with a custom scoring function this way:

def gain_fn(y_true, y_prob):
    tp = np.where((y_prob = 0.02)  (y_true==1), 40000, 0)
    fp = np.where((y_prob = 0.02)  (y_true==0), -1000, 0)
    return np.sum([tp,fp])

scorer_fn = make_scorer(gain_fn, greater_is_better = True, needs_proba=True)

randomcv = RandomizedSearchCV(estimator=clf,param_distributions = params_grid,
                          cv=kfoldcv,n_iter=100, n_jobs=-1, scoring=scorer_fn)

but I need to make a calculation, inside of gain_fn, with y_prob of a specific class (it has 3 possible values). Any suggestions?

Topic scoring scikit-learn python

Category Data Science


make_scorer has a parameter needs_proba which is False by default, and you need to set it to True, thus instead of class label (output of clf.predict(...)), RandomizedSearchCV will pass a probability (output of clf.predict_proba(...)) into your scoring function:

scorer_fn = make_scorer(gain_fn, greater_is_better = True, needs_proba=True)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.