Accuracy is lower than f1-score for imbalanced data

For a binary classification, I have a dataset with 55% negative label and 45% positive labels.

The results of the classifier shows that the accuracy is lower than the f1-score. Does that mean that the model is learning the negative instances much better than the positive ones?

Does that even make sense, to have accuracy less than the f1-score?

Topic f1score accuracy confusion-matrix classification

Category Data Science


enter image description here

Imagine this case: you have actual labels of 60 true while 100 false; typically a model will more likely predict all false in my experience rather than the situation above, but in case where the machine predicts all to be true, then you can have fscore higher than accuracy.


It's helpful to look at the formula of accuracy and F1 score. $$Accuracy = \frac{TP+TN}{TP+TN+FP+FN}$$ and $$F1= \frac{2 TP}{2TP + FP + FN} $$ Now you are in the situation which Accuracy < F1. A simple algebraic manipulation will give you $TN < TP$. So your model predicts the positive better the negative. It depends on other factors to see whether this is fine or not, but for your case (a little bit imbalanced), I guess it's fine.


I'll try to answer this with a couple examples:

Say we have 100 instances (55 negative, 45 positive). Let's say we predict 1/45 positives and 55/55 negatives correctly. Then our accuracy is 0.56 but our F1 score is 0.0435.

Now suppose we predict everything as positive: we get an accuracy of 0.45 and an F1 score of 0.6207.

Therefore, accuracy does not have to be greater than F1 score.

Because the F1 score is the harmonic mean of precision and recall, intuition can be somewhat difficult. I think it is much easier to grasp the equivalent Dice coefficient.

As a side-note, the F1 score is inherently skewed because it does not account for true negatives. It is also dependent on the high-level classification of "positive" and "negative", so it is also relatively arbitrary. That's why other metrics such as Matthew's Correlation Coefficient are better.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.