How to pick best model based on Accuracy and Recall in a GridSearchCV when you have already set scoring = custom_scorer?

Question

How to pick best model based on Accuracy and Recall in a GridSearchCV when you have already set scoring = custom_scorer?

SpaceSloth

2021年8月13日 20:27

This is a binary classification problem, I am using a GridSearchCV from Sklearn to find the best model, here is the GridSearch line I am using:

 scoring = {'AUCe': 'roc_auc', 'Accuracy': 'accuracy', 'prec':  'precision', 'rec': 'recall', 'f1s': 'f1','spec':make_scorer(recall_score,pos_label=0)}

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=cv, scoring=scoring, refit='Accuracy')

All is fine, but, my problem is that I want the model to be picked based on both highest Accuracy and Recall, I know that in order to sort the values and pick the best model based on a specific metric, I have to set refit = 'Accuracy', I also read on sklearn that refit can also be set to bool, str, or callable, and here is where my problem lies

I have read that if you use a custom scorer for scoring parameter, you lose the ability to use a custom refit, this is the article: https://github.com/scikit-learn/scikit-learn/issues/17058

As you can see from my code above, I used a multimetric scoring, how can I set refit to be based on Accuracy and Recall while still having multimetric scoring ability, currently I am printing cv_results_ tables in pandas dataframe, sorting each group of different models by Accuracy and Recall, then picking the highest one and applying it on my hold-out testing data, now this gets tedious if you have to first apply GridSearch on first similar sets of models, then take the best model from every one, then compare them again.

Topic gridsearchcv scikit-learn machine-learning

Category Data Science

How to pick best model based on Accuracy and Recall in a GridSearchCV when you have already set scoring = custom_scorer?

About