Object Detection without annotations and labels
Problem Statement:
I am given 2 sets of images. All the images in both sets are without annotations and labels.
First set : a set of images of the grocery store shelves (captured in the grocery stores).
Second set: a set of close-up images of the products kept on those store shelves.
What I am trying to achieve:
I want to first locate and then predict a bounding box Product for a Product in the set of images of Grocery shelves (first set), given a separate set of the Product images (second set)
Visually:
Output Corresponding Shelf image
My approach:
- For each product image, first find all the shelf image(s) which contain that product.
- Then predict a bounding box by finding the location of the product in the shelf image.
I am using YOLOv5 for this task but I am not sure how should I start off with the Training part of YOLOv5 or any other better model, given that I have to do it without annotations or labels.
I have come across terms like Zero-shot learning, Self-Supervised Object Detection, etc. but I haven't been able to figure out their use as a starting point.
There's a similar question asked on StackOverflow and the other on this website, but I am not sure the answers to any of those solve the problem.