How to measure multi-label multi-class accuracy

I have a model that has multi-label multi-class targets

Example

Age Height Weight Mark Distance Red Yellow Green Blue Black White
14 160 62 78 103 0 1 1 1 1 0
56 177 90 99 363 1 1 0 0 0 0
32 179 79 83 737 0 0 0 0 1 0
17 180 94 75 360 1 0 1 1 1 1
43 186 102 51 525 0 0 0 0 0 0
55 168 74 48 644 1 1 0 1 1 0
18 182 93 58 127 1 0 1 0 1 1

Target values are the colours (Red, Yellow, Blue Green, White Black)

when I build my model and test different measures

I get F1 score of 0.78

but I get very low accuracy 0.03

Why is that big difference? and which measure shall I use?

Topic f1score auc accuracy

Category Data Science


It's either multi-label or multiclass classification, not both. This case is multi-label classification: zero, one or several labels for every instance.

You didn't say how you obtain these scores so it's difficult to know what's going wrong but these scores are not consistent. My guess would be that the accuracy is not calculated properly, possibly because the function is not called with the right arguments. Keep in mind that both F1-score and accuracy evaluate binary classification, by themselves they cannot account for multi-label.

Technically multi-label classification with $n$ labels is equivalent to $n$ independent binary classifiers, one for every label. Thus the first level of evaluation is for every independent binary classifier. Then these scores can be aggregated in different ways.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.