Incorrect example of applying Bayes theorem

I have been reading the book The Data Science Design Manual (by Steven S. Skiena) and I came across an example that explained how the Bayes theorem can be applied that confused me and made me suspect it might be wrong. The example is the following:

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$ Suppose A is the event that person x is actually a terrorist, and B is the result of a feature-based classifier that decides if x looks like a terrorist. When trained/evaluated on a data set of 1,000 people, half of whom were terrorists, the classifier achieved an enviable accuracy of, say, 90%. The classifier now says that Skiena looks like a terrorist. What is the probability that Skiena really is a terrorist? The key insight here is that the prior probability of “x is a terrorist” is really, really low. If there are a hundred terrorists operating in the United States, then P(A) = 100/300,000,000 = 3.33 × 10−7 . The probability of the terrorist detector saying yes, P(B) = 0.5, while the probability of the detector being right when it says yes P(B|A) = 0.9. Multiplying this out gives a still very tiny probability that I am a bad guy, $$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} = \frac{(0.9)(3.33x10^{-7})}{0.5} = 6x10^{-7} $$

However, $P(B) = 0.5$ doesn't seem correct to me. $P(B)$ is supposed to be the probability of the terrorist detector saying yes when exercised on a person randomly selected from the United State's population (e.g. Skiena). If I understand correctly, this $0.5$ used by the author is the percentage of terrorists in the evaluation data set for the classifier, which is not the same thing for several reasons:

  • This is a sample that is not randomly selected to be equivalent to some population (the one Skiena is selected from), but specifically selected to contain the aforementioned ratio of terrorists.
  • This ratio is not the ratio of people in the evaluation dataset that look like terrorists (i.e. the probability the classifier would say yes for a random person in that sample), but the ratio of actual terrorists in the sample.

My understanding is that in order to calculate $P(B)$ more properly one would have to draw a random sample from the United States population (assuming this is where Skiena is picked up from), then run the classifier on them and calculate the percentage of people the classifier said yes for.

Is my thinking correct or am I missing something?

Topic bayesian probability data statistics

Category Data Science


P(B) - the output of the model for an observation - is not going to change in this case. The model is deterministic with respect to the inputs. Skiena can be scored multiple times and will get the same score.

You are questioning if P(B)=0.5 is actually a 50% chance. That is a good question. It is called calibration. Is the output of the model well-calibrated. Many (most) of the models I have dealt with are not well calibrated. They are simply rank-ordering. A 0.5 is higher than a 0.4 therefore the 0.5 is closer to the event (Y=1, riskier, more similar, etc.). There are some algorithms that build models closer to well-calibrated and some post-processing that can be done for calibration. Your solution is about determining if the model is well-calibrated and if not, how to adjust the output to be well-calibrated.

It is a great question - What does the score actually mean? Is it rank-order only or is it also well-calibrated.

In this example, it seems the author is implying the model is well-calibrated. Hence the P(B) notation instead of "the output of B" or some similar notation. Occam's razor. So reading the question, assume P(B) = 50% chance.

Different metrics go for calibration vs rank ordering. Many people like AUROC which is for rank ordering. AUROC gives no insight into calibration.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.