How to calculate mAP for detection task for the PASCAL VOC Challenge?

Question

How to calculate mAP for detection task for the PASCAL VOC Challenge?

Alex

2021年5月13日 20:26

How to calculate the mAP (mean Average Precision) for the detection task for the Pascal VOC leaderboards?

There said - at page 11:

Average Precision (AP). For the VOC2007 challenge, the interpolated average precision (Salton and Mcgill 1986) was used to evaluate both classification and detection. For a given task and class, the precision/recall curve is computed from a method’s ranked output. Recall is defined as the proportion of all positive examples ranked above a given rank. Precision is the proportion of all examples above that rank which are from the positive class. The AP summarises the shape of the precision/recall curve, and is defined as the mean precision at a set of eleven equally spaced recall levels [0,0.1,...,1]: AP = 1/11 ∑ r∈{0,0.1,...,1} pinterp(r)

The precision at each recall level r is interpolated by taking the maximum precision measured for a method for which the corresponding recall exceeds r: pinterp(r) = max p(r˜), where p(r˜) is the measured precision at recall ˜r

About mAP

So does it mean that:

We calculate Precision and Recall:

A) For many different IoU {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1} we calculate True/False Positive/Negative values

Where True positive = Number_of_detection with IoU {0, 0.1,..., 1}, as said here and then we calculate:

Precision = True positive / (True positive + False positive)

Recall = True positive / (True positive + False negative)

B) Or for many different thresholds of detection algorithms we calculate:

Precision = True positive / (True positive + False positive)

Recall = True positive / (True positive + False negative)

Where True positive = Number_of_detection with IoU 0.5 as said here

C) Or for many different thresholds of detection algorithms we calculate:

Precision = Intersect / Detected_box

Recall = Intersect / Object

As shown here?

Then we build Precision-Recall curve, as shown here:

Then we calculate AP (average precision) as average of 11 values of Precision at the points where Recall = {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}, i.e. AP = 1/11 ∑ recall∈{0,0.1,...,1} Precision(Recall)

(In general for each point, for example 0.3, we get MAX of Precision for Recall = 0.3, instead of value of Precision at this point Recall=0.3)

And when we calculate AP only for 1 something object class on all images - then we get AP (average precision) for this class, for example, only for air.

So AP is a integral (area under the curve)

But when we calculate AP for all object classes on all images - then we get mAP (mean average precision) for all images dataset.

Questions:

Is it right, and if it isn't, then how to calculate mAP for Pascal VOC Challenge?
And which of the 3 formulas (A, B or C) is correct for calculating Precision and Recall, in paragraph 1?

Short answer:

mAP = AVG(AP for each object class)
AP = AVG(Precision for each of 11 Recalls {precision = 0, 0.1, ..., 1})
PR-curve = Precision and Recall (for each Threshold that is in the Predictions bound-boxes)
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
TP = number of detections with IoU0.5
FP = number of detections with IoU=0.5 or detected more than once
FN = number of objects that not detected or detected with IoU=0.5

Topic computer-vision object-recognition neural-network svm machine-learning

Category Data Science

Rafael Padilla · Accepted Answer · 2021年5月13日 20:26

1

Rafael Padilla answered at 2021年5月13日 20:26

There is a nice and detailed explanation with an easy to use code on my Github.
Certainly it will help you guys.

Dani Mesejo · Accepted Answer · 2021年2月4日 12:14

To answer your questions:

Yes your approach is right
Of A, B and C the right answer is B.

The explanation is the following: In order to calculate Mean Average Precision (mAP) in the context of Object Detection you must compute the Average Precision (AP) for each class, and then compute the mean across all classes. The key here is to compute the AP for each class, in general for computing Precision (P) and Recall (R) you must define what are: True Positives (TP), False Positives (FP), True Negative (TN) and False Negative (FN). In the setting of Object Detection of the Pascal VOC Challenge are the following:

TP: are the Bounding Boxes (BB) that the intersection over union (IoU) with the ground truth (GT) is above 0.5
FP: two cases (a) BB that the IoU with GT is below 0.5 (b) the BB that have IoU with a GT that has already been detected.
TN: there are not true negative, the image are expected to contain at least one object
FN: those ground truthes for which the method failed to produce a BB

Now each predicted BB have a confidence value for the given class. So the scoring method sort the predictions for decreasing order of confidence and compute the P = TP / (TP + FP) and R = TP / (TP + FN) for each possible rank k = 1 up to the number of predictions. So now you have a (P, R) for each rank those P and R are the "raw" Precision-Recall curve. To compute the interpolated P-R curve foreach value of R you select the maximum P that has a corresponding R' >= R.

There are two different ways to sample P-R curve points according to voc devkit doc. For VOC Challenge before 2010, we select the maximum P obtained for any R' >= R, which R belongs to 0, 0.1, ..., 1 (eleven points). The AP is then the average precision at each of the Recall thresholds. For VOC Challenge 2010 and after, we still select the maximum P for any R' >= R, while R belongs to all unique recall values (include 0 and 1). The AP is then the area size under P-R curve. Notice that in the case that you don't have a value of P with Recall above some of the thresholds the Precision value is 0.

For instance consider the following output of a method given the class "Aeroplane":

BB  | confidence | GT
----------------------
BB1 |  0.9       | 1
----------------------
BB2 |  0.9       | 1
----------------------
BB3 |  0.7       | 0
----------------------
BB4 |  0.7       | 0
----------------------
BB5 |  0.7       | 1
----------------------
BB6 |  0.7       | 0
----------------------
BB7 |  0.7       | 0
----------------------
BB8 |  0.7       | 1
----------------------
BB9 |  0.7       | 1
----------------------

Besides it not detected bounding boxes in two images, so we have FN = 2. The previous table is the ordered rank by confidence value of the predictions of the method GT = 1 means is a TP and GT = 0 FP. So TP=5 (BB1, BB2, BB5, BB8 and BB9), FP=5. For the case of rank=3 the precision drops because BB1 was already detected, so even if the object is indeed present it counts as a FP. .

rank=1  precision=1.00 and recall=0.14
----------
rank=2  precision=1.00 and recall=0.29
----------
rank=3  precision=0.66 and recall=0.29
----------
rank=4  precision=0.50 and recall=0.29
----------
rank=5  precision=0.40 and recall=0.29
----------
rank=6  precision=0.50 and recall=0.43
----------
rank=7  precision=0.43 and recall=0.43
----------
rank=8  precision=0.38 and recall=0.43
----------
rank=9  precision=0.44 and recall=0.57
----------
rank=10 precision=0.50 and recall=0.71
----------

Given the previous results: If we used the way before voc2010, the interpolated Precision values are 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0, 0. Then AP = 5.5 / 11 = 0.5 for the class of "Aeroplanes". Else if we used the way since voc2010, the interpolated Precision values are 1, 1, 1, 0.5, 0.5, 0.5, 0 for seven unique recalls that are 0, 0.14, 0.29, 0.43, 0.57, 0.71, 1.Then AP = (0.14-0)*1 + (0.29-0.14)*1 + (0.43-0.29)*0.5 + (0.57-0.43)*0.5 + (0.71-0.57)*0.5 + (1-0.71)*0 = 0.5 for the class of "Aeroplanes".

Repeat for each class and then you have the (mAP).

More information can be found in the following links 1, 2. Also you should check the paper: The PASCAL Visual Object Classes Challenge: A Retrospective for a more detailed explanation.