How to weigh imbalanced softlabels?

The target is a probability between N classes, I don't want it to predict the class with the highest probability but the 'actual' probability per class.

For example:

|    | Class 1 | Class 2 | Class 3 |
------------------------------------
|  1 |     0.9 |    0.05 |    0.05 |
|  2 |     0.2 |     0.8 |       0 |
|  3 |     0.3 |     0.3 |     0.4 |
|  4 |     0.7 |       0 |     0.3 |
------------------------------------
|  + |     2.1 |    1.15 |    0.75 | - correct this imbalance?
| 0 |       4 |       3 |       3 | - or this one?

Some classes have 'more' samples in the sense that the sum of probabilities is higher than other classes. Do I have to balance this out with weights in the loss function? Or do I only correct for the imbalance in >0 as normally?

Topic labels class-imbalance

Category Data Science


If you have imbalanced classes (for example, if you have 3 classes and 100 examples of class 1 and 1000 examples of class 2 and 5000 examples of class 3), then yes, I would weight the loss function (I would use weighted categorical cross-entropy).

If you mean some classes have a higher probability than others, then this is normal and expected behaviour. For example, if you were doing a 10-class classification problem like on MNIST, and you're trying to predict a given image, if the image has some rounded sections then it's much more likely to be a 3 or an 8 than a 1.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.