GridSearch CV: Suitable scoring metrics for Imbalanced data sets

I am new to machine learning. This is my $1^{st}$ machine learning project and I am working on classification on an imbalanced dataset. There are also multi-classes in the target variable.

I would like to know what is the most suitable metrics for scoring the performance in the GridSearchCV.

I think

  1. roc_au is sometimes used for imbalanced dataset. But there are several

‘roc_auc’

‘roc_auc_ovo’

‘roc_auc_ovr’

Which should I use?

  1. Alternatively, precision-recall_auc is also used. But I can't seem to find this scoring metrics for GridSearchCV. How do I use it in GridSearchCV?

Thank you

X_train, X_test, y_train, y_test = train_test_split(X_total, Y_total, random_state=0, test_size=0.25)
kfold =GroupKFold(n_splits=3)
grid_search = GridSearchCV(RandomForestClassifier(random_state=0), hyperF, cv = kfold, scoring=, verbose = 1, n_jobs = -1)

Topic grid-search class-imbalance

Category Data Science


One possible solution is to use scikit-learn's average_precision_score which is very similar to area under the precision-recall curve.

Since average_precision_score is a metric it will will work with scikit-learn's GridSearchCV.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.