macro average and weighted average meaning in classification_report

I use the "classification_report" from from sklearn.metrics import classification_report in order to evaluate the imbalanced binary classification

Classification Report :
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     28432
           1       0.02      0.02      0.02        49

    accuracy                           1.00     28481
   macro avg       0.51      0.51      0.51     28481
weighted avg       1.00      1.00      1.00     28481

I do not understand clearly what is the meaning of macro avg and weighted average? and how we can clarify the best solution based on how close their amount to one!

I have read it: macro average (averaging the unweighted mean per label), weighted average (averaging the support-weighted mean per label)

but I still have a problem in understanding how good is result based on how close these amount to 1? How I can explain it?

Topic class-imbalance accuracy classification

Category Data Science


Macro avg represents the arithmetic mean between the f1_scores of the two categories, such that both scores have the same importance: Macro avg = (f1_0 + f1_1) / 2 Micro avg takes into account how many samples there are per category (the greater the support, the more important that category's f1_score): Micro avg = (f1_0 x support_0 + f1_1 x support_1) / (support_0 + support_1) The category with less support usually tends to have lower scores because it can be hard to catch rarer categories. If the support for your 1 class was very low (say 10) the f1_score could have been very low (say 0.1). In this case the macro avg would give you a very low value, whereas the micro avg would have given much more importance to the score in the 0 class because of the greater support. Once you know the implications, it is up to you to decide what you prefer to use.


macro-avg is mean average macro-avg is mean average precision/recall/F1 of all classes. in your case macro-avg = (precision of class 0 + precision of class 1)/2. hence your macro-avg is 51. while weighed avg is the total number TP(true positive of all classes)/total number of objects in all classes. example based on your model. assume TP of class 0 = 28400(the model recognise 28400 object out of 28432 given) and TP of class 1 = 1(the model recognise 1 object out 49 given)

precision of class 0 = TP of class 0/total number of object = 28400/28432 = 1.

precision of class 1 = TP of class 1/total number of object = 1/49 = 0.02

macro average = (precision of class 0 + precision of class 1)/2 = (1 + 0.02)/2 = 0.51

weighted average is precision of all classes merge together. weighted average = (TP of class 0 + TP of class 1)/(total number of class 0 + total number of class 1 = (28400 + 1)/(28432+49) = 1.


  • Macro F1 calculates the F1 separated by class but not using weights for the aggregation:

    $$F1_{class1}+F1_{class2}+\cdot\cdot\cdot+F1_{classN}$$

    which resuls in a bigger penalisation when your model does not perform well with the minority classes(which is exactly what you want when there is imbalance)

  • Weighted F1 score calculates the F1 score for each class independently but when it adds them together uses a weight that depends on the number of true labels of each class:

    $$F1_{class1}*W_1+F1_{class2}*W_2+\cdot\cdot\cdot+F1_{classN}*W_N$$

    therefore favouring the majority class (which is want you usually dont want)

Conclusion Your model is false regarding the 1 class, which your macro F1 correctly represents and weighted does not, hence the difference to number 1


Your data set is unbalanced since 28432 out of 28481 examples belong to class 0 (that is 99.8%). Therefore, your predictor almost always predicts any given sample as belonging to class 0 and thereby achieves very high scores like precision and recall for class 0 and very low scores for class 1.

In the case of weighted average the performance metrics are weighted accordingly: $$score_{weighted\text{-}avg} = 0.998 \cdot score_{class\text{ }0} + 0.002 \cdot score_{class\text{ }1}$$ Which turns out to be 1 due the class imbalances.

However, macro avg is not weighted and therefore $$score_{macro\text{-}avg} = 0.5 \cdot score_{class\text{ }0} + 0.5 \cdot score_{class\text{ }1}$$

Since your model just guesses to almost always predict class 0 these scores turn out to be poor.

Going forward I suggest start reading about unbalanced classification problems. There are many approaches how to tackle this. One important question here is whether false predictions for the two classes lead to different cost or not (which, for example, typically is the case in medical applications, spam filters or financial transactions). Because if they do not then a predictor always guessing the majority class could even make sense. But that strongly depends on the area and way of application.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.