How to benchmark own model trained by Yolov5

I trained a model with my dataset for object detection - using 1500 samples. Now I'm not pretty sure how to benchmark my model. What is the procedure before using the model? Are the parameters in the output below reliable? Or should I test my model on another separate dataset?

I want to be sure that my model is good enough before using it.

I got the following results:

wandb: Run history:
wandb:        metrics/mAP_0.5 ▁▅▆▆▇▇▇▇▇▇▇▇████████████████████████████
wandb:   metrics/mAP_0.5:0.95 ▁▄▅▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇██████████████████████
wandb:      metrics/precision ▁▃▅▅▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇█████████████████████
wandb:         metrics/recall ▁▅▅▆▆▇▆▆▇▇▇▇▇▇▇▇██▇█████████████████████
wandb:         train/box_loss █▅▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:         train/cls_loss █▆▅▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:         train/obj_loss █▅▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:           val/box_loss █▄▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:           val/cls_loss █▅▄▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:           val/obj_loss █▅▄▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:                  x/lr0 ▃██████▇▇▇▇▇▆▆▆▆▆▅▅▅▄▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁
wandb:                  x/lr1 ▃██████▇▇▇▇▇▆▆▆▆▆▅▅▅▄▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁
wandb:                  x/lr2 █▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:
wandb: Run summary:
wandb:        metrics/mAP_0.5 0.97796
wandb:   metrics/mAP_0.5:0.95 0.83993
wandb:      metrics/precision 0.96047
wandb:         metrics/recall 0.94034
wandb:         train/box_loss 0.01183
wandb:         train/cls_loss 0.0021
wandb:         train/obj_loss 0.00885
wandb:           val/box_loss 0.01047
wandb:           val/cls_loss 0.00102
wandb:           val/obj_loss 0.00545
wandb:                  x/lr0 0.001
wandb:                  x/lr1 0.001
wandb:                  x/lr2 0.001
wandb:
wandb: Synced 5 WB file(s), 335 media file(s), 1 artifact file(s) and 0 other file(s)
wandb: Synced graceful-surf-7: https://wandb.ai/nae2/train/reports/Untitled-Report--VmlldzoxMzYzMTE2?accessToken=j347bnjzpuwpfl1mah5mr5amf1gltuxdrcziokqebofghod84da2mwkihl13lp8z
wandb: Find logs at: .\wandb\run-20211218_081552-1x05vi2n\logs\debug.log

You can see complete report also here: https://wandb.ai/nae2/train/reports/Untitled-Report--VmlldzoxMzYzMTE2?accessToken=j347bnjzpuwpfl1mah5mr5amf1gltuxdrcziokqebofghod84da2mwkihl13lp8z

Topic yolo computer-vision machine-learning

Category Data Science


Models can be benchmarked by absolute values or relative values. Absolute values are frequently used in a business context. Is the model good enough to accomplish a useful task? Relative values are often used for comparing different models and/or different hyperparameters.

One of the primary goals of machine learning is prediction. The best way to assess prediction is to see how a model performs on data that the model has not seen during training.

You can define the goal of the project. Then see how the model performs on untrained, labeled data related to that goal.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.