YOLOv5 can't detect object on custom dataset
Context: I'm trying to utilize an object detection model (YOLOv5) to detect damage/defects on cars (dents, scratches, cracks). Right now the goal is to make a minimum viable prototype, a model able to detect these defects on static images; if successful this might be utilized for a real-time quality control use case in a car manufacturing plant.
Problem: For pilot test, I used a really small image dataset (~100 images) from Kaggle, whose bounding boxes I've labeled myself. With these images divided into train, val, and test, I then cloned YOLOv5's github repo, and ran their train.py script with the following settings.
python train.py --img 640 --cfg yolov5s.yaml --hyp hyp.scratch-med.yaml --batch 16 --epochs 50 --data car_damage.yaml --weights yolov5s.pt --name yolo_car_dmg
The result was disappointing, as there was nothing detected in the test set (even its own training set) when I ran detect.py with 0.25 confidence threshold. I figured lowering the conf threshold to 0.001 would reveal at least a few bounding boxes (which it did), but none of them were close to the defect/damage on each car.
python detect.py --source ../car_dmg_dataset/subset_train/images/train/ --weights runs/train/yolo_car_dmg/weights/best.pt --conf 0.25 --name yolo_car_dmg
How I try to approach this problem, either:
- Training data is lacking?
- Need larger dataset/more images
- Poor image quality (lighting, etc)
- Need more data preprocessing (but I believe YOLOv5 already has it built in)
- Trained model is lacking?
- Didn't utilize pretrained layers/trained from scratch
- Bad hyperparameters
- YOLOv5 poor fit for the job (alternative models?)
My question: Out of these hypotheses, which one should I try to improve on? I'm leaning towards needing a larger dataset; if so, what number of training images would be appropriate? If not, what other options should I explore?
I also realize this use case is not as common as everyday objects: car dents and scratches aren't really objects, they don't have distinct outlines and stand out as other objects do. But if more data is needed, I need to know how much because labelling bounding boxes manually is really time consuming. Any suggestions would really help, thanks!
Topic object-detection yolo cnn computer-vision deep-learning
Category Data Science