Class weights for imbalanced data in multilabel problems

I am trying to train a CNN for a multiclass - multilabel classification task (20 classes, each sample can belong to 1+ labels) and the dataset is highly imbalanced. In single-label cases I would use the compute_class_weights function from sklearn to calculate the class weights in order to help the optimizer to account for the minority class. However, for the multilabel case I feel its not working as supposed to, because it considers as number of samples the number of times all classes occur, while the actual number of samples are less (since its multilabel). Is anyone familiar with a function, or even a formula, to calculate the class weights in this case?

Thanks

Topic weighted-data class-imbalance

Category Data Science


I think what you are looking for in this case is "cost-sensitive classification" - you can look it up on google scholar to find some papers. You'll likely have to define "costs" for each type of misclassification.

Alternatively, if you are doing one-vs-rest classification, you can reweight/upsample/downsample differently for each class.

If your dataset is small, you could also try to set weights as the objective of some constrained optimization objective that would make them more balanced.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.