YOLOv5 can't detect object on custom dataset

Context: I'm trying to utilize an object detection model (YOLOv5) to detect damage/defects on cars (dents, scratches, cracks). Right now the goal is to make a minimum viable prototype, a model able to detect these defects on static images; if successful this might be utilized for a real-time quality control use case in a car manufacturing plant.

Problem: For pilot test, I used a really small image dataset (~100 images) from Kaggle, whose bounding boxes I've labeled myself. With these images divided into train, val, and test, I then cloned YOLOv5's github repo, and ran their train.py script with the following settings.

python train.py --img 640 --cfg yolov5s.yaml --hyp hyp.scratch-med.yaml --batch 16 --epochs 50 --data car_damage.yaml --weights yolov5s.pt --name yolo_car_dmg

The result was disappointing, as there was nothing detected in the test set (even its own training set) when I ran detect.py with 0.25 confidence threshold. I figured lowering the conf threshold to 0.001 would reveal at least a few bounding boxes (which it did), but none of them were close to the defect/damage on each car.

python detect.py --source ../car_dmg_dataset/subset_train/images/train/ --weights runs/train/yolo_car_dmg/weights/best.pt --conf 0.25 --name yolo_car_dmg

How I try to approach this problem, either:

  1. Training data is lacking?
    • Need larger dataset/more images
    • Poor image quality (lighting, etc)
    • Need more data preprocessing (but I believe YOLOv5 already has it built in)
  2. Trained model is lacking?
    • Didn't utilize pretrained layers/trained from scratch
    • Bad hyperparameters
    • YOLOv5 poor fit for the job (alternative models?)

My question: Out of these hypotheses, which one should I try to improve on? I'm leaning towards needing a larger dataset; if so, what number of training images would be appropriate? If not, what other options should I explore?

I also realize this use case is not as common as everyday objects: car dents and scratches aren't really objects, they don't have distinct outlines and stand out as other objects do. But if more data is needed, I need to know how much because labelling bounding boxes manually is really time consuming. Any suggestions would really help, thanks!

Topic object-detection yolo cnn computer-vision deep-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.