mAP should be calculated on validation set or on test set?

I have a yolov3 model for object detection on 9 classes. What is the difference in computing metrics (such as mAP) on validation set and on a test set (unseen data)? What is usually done in literature and why?

Topic object-detection computer-vision deep-learning machine-learning

Category Data Science


I'm unsure about yolo however it may be possible that you want to tune some hyperparameters of the model.

As you may already know, optimizations based on the evaluation of training data make little sense. Therefore let's assume that you split your dataset into training and test data and train n models each with a different combination of hyperparameters. You could greedily select the best performing parameters based on the calculated metric on the test data and use those parameters to train the final model.

But which metric would you list for your resulting model? The corresponding metric evaluated on the test data? That would be very optimistic as those chosen parameters result in good metric results on that particular test data and because you optimized for that test data.

One solution is to split the data into train, validation and test data. Now you can optimize or choose the best hyperparameters by looking at the metric evaluated on the validation data. However, the final metric evaluation should be done on the so far unseen test data. Another solution is to use nested cross validation which is described here.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.