Class weights for imbalanced data in multilabel problems

Question

Class weights for imbalanced data in multilabel problems

Costas Papastamos

2022年4月9日 04:03

I am trying to train a CNN for a multiclass - multilabel classification task (20 classes, each sample can belong to 1+ labels) and the dataset is highly imbalanced. In single-label cases I would use the compute_class_weights function from sklearn to calculate the class weights in order to help the optimizer to account for the minority class. However, for the multilabel case I feel its not working as supposed to, because it considers as number of samples the number of times all classes occur, while the actual number of samples are less (since its multilabel). Is anyone familiar with a function, or even a formula, to calculate the class weights in this case?

Thanks

Topic weighted-data class-imbalance

Category Data Science

anymous.asker · Accepted Answer · 2018年10月12日 13:14

I think what you are looking for in this case is "cost-sensitive classification" - you can look it up on google scholar to find some papers. You'll likely have to define "costs" for each type of misclassification.

Alternatively, if you are doing one-vs-rest classification, you can reweight/upsample/downsample differently for each class.

If your dataset is small, you could also try to set weights as the objective of some constrained optimization objective that would make them more balanced.

Class weights for imbalanced data in multilabel problems

About