Precision and recall clarification

I'm reading the book Fundamentals of Machine Learning for Predictive Data Analytics by Kelleher, et al. I've come across something that I think as an error but I want to check to be sure. When explaining precision and recall the authors write:

Email classification is a good application scenario in which the different information provided by precision and recall is useful. The precision value tells us how likely it is that a genuine ham email could be marked as spam and, presumably, deleted: 25% (1 − precision). Recall, on the other hand, tells us how likely it is that a spam email will be missed by the system and end up in our inbox: 33.333% (1 − recall).

Precision is defined as: $TP \over {TP + FP}$. Thus: $$1 - precision = 1 - {TP \over TP+FP} = {FP \over TP + FP} = P(\textrm{prediction incorrect}|\textrm{prediction positive})$$ So this should give us the probability that an email marked as ham (positive prediction) is actually spam. So precision and recall in the quote above should be switched?

Topic books

Category Data Science


It's very likely that the authors assume that the spam class is positive, whereas you intuitively associated the ham class with positive. Both options make sense in my opinion:

  • the former interpretation is based on the idea that the goal of the task is to detect the spam emails, seen as the class of interest.
  • the later interpretation considers that the ham emails are the "good ones", the ones that we want, hence the "positive" class.

There's no error when one reads the paragraph with the authors' interpretation in mind. This confusion illustrates why one should always clearly define which class is defined as positive in a binary classification problem :)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.