Unbalanced data set - how to optimize hyperparams via grid search?

I would like to optimize the hyperparameters C and Gamma of an SVC by using grid search for an unbalanced data set. So far I have used class_weights='balanced' and selected the best hyperparameters based on the average of the f1-scores. However, the data set is very unbalanced, i.e. if I chose GridSearchCV with cv=10, then some minority classes are not represented in the validation data. I'm thinking of using SMOTE, but I see the problem here that I would have to set k_neighbors=1 because in some minority classes there are often only 1-2 samples. Does anyone have a tip how to optimized the hyperparameters in this case? Are there any alternatives?

Many thanks for every hint

Topic grid-search smote multiclass-classification class-imbalance scikit-learn

Category Data Science


Scikit-learn's GridSearchCV uses StratifiedKFold so all classes will be proportional represented in the splits. GridSearchCV can be used for hyperparameter search.

Imbalanced-learn's SMOTE can also be used. If there are fewer samples than k, it will only use available samples.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.