How to refit GridSearchCV on Multiclass problem

I'm trying to use GridSearchCV for my Multiclass problem. For starters, wanted to test it on KNeighborsClassifier.

First, here's the code where I define the function which uses GridSearchCV:

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold

def grid_search(estimator, parameters, X, y):
    scoring = ['accuracy', 'precision', 'recall']
    kf = KFold(5)
    
    clf = GridSearchCV(estimator, parameters, cv=kf, scoring=scoring, refit=accuracy, n_jobs=-1)
    clf.fit(X, y)
    
    i = clf.best_index_
    best_precision = clf.cv_results_['mean_test_precision'][i]
    best_recall = clf.cv_results_['mean_test_recall'][i]
    
    print('Best score (accuracy): {}'.format(clf.best_score_))
    print('Mean precision: {}'.format(best_precision))
    print('Mean recall: {}'.format(best_recall))
    print('Best parametes: {}'.format(clf.best_params_))
    
    return clf.best_estimator_

And, here's where I use it, when I try running a K nearest neighbors classifier:

from sklearn.neighbors import KNeighborsClassifier

parameters = {'n_neighbors': [1, 2, 5, 10], 'weights': ['uniform', 'distance'], 'metric': ['manhattan', 'euclidean', 'chebyshev']}

knn = grid_search(KNeighborsClassifier(n_jobs=-1), parameters, X_train, y_train)

Under the current state of the above code I'm getting the following ValueError

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

As you may have guessed, this might be related to the value of the refit parameter for GridSearchCV which currently is set to refit=accuracy and this cannot work because the problem is multiclass. I changed it's value many times, tried True or other explicitly stated metrics and nothing fixed the problem. On some of those tries, the error message changed to:

ValueError: For multi-metric scoring, the parameter refit must be set to a scorer key or a callable to refit an estimator with the best parameter setting on the whole data and make the best_* attributes available for that metric. If this is not needed, refit should be set to False explicitly.

Any advice?

Topic grid-search gridsearchcv scikit-learn python

Category Data Science


Nevermind, I realized my mistake. precision and recall don't work with multiclass data. They should be replaced by precision_macro and recall_macro instead.

Speaking of which, the one place where I tried using them on the code should be also changed into:

best_precision = clf.cv_results_['mean_test_precision_macro'][i]
best_recall = clf.cv_results_['mean_test_recall_macro'][i]

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.