Object Detection without annotations and labels

Problem Statement:

I am given 2 sets of images. All the images in both sets are without annotations and labels.

First set : a set of images of the grocery store shelves (captured in the grocery stores).

Second set: a set of close-up images of the products kept on those store shelves.

What I am trying to achieve:

I want to first locate and then predict a bounding box Product for a Product in the set of images of Grocery shelves (first set), given a separate set of the Product images (second set)

Visually:

Input Product image

Output Corresponding Shelf image

My approach:

  1. For each product image, first find all the shelf image(s) which contain that product.
  2. Then predict a bounding box by finding the location of the product in the shelf image.

I am using YOLOv5 for this task but I am not sure how should I start off with the Training part of YOLOv5 or any other better model, given that I have to do it without annotations or labels.

I have come across terms like Zero-shot learning, Self-Supervised Object Detection, etc. but I haven't been able to figure out their use as a starting point.

There's a similar question asked on StackOverflow and the other on this website, but I am not sure the answers to any of those solve the problem.

Topic object-detection yolo image-recognition computer-vision object-recognition

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.