How does deep learning helps in detecting multiple objects in single image?

Question

How does deep learning helps in detecting multiple objects in single image?

Amanuel Negash

2021年2月16日 15:50

Let's say there are two cars in an image. How can it detect these cars, given that it can detect single car in an image?

Topic convolutional-neural-network object-recognition deep-learning

Category Data Science

Achyut Sarma · Accepted Answer · 2021年1月15日 17:02

Have a simple CNN such as Efficientnet B0 or B1 do the trick for you. Curate your training data such that you have 2 classes: single vs multi. And let the classifier take care of the business for you if localization is not required.

softwareguru1 · Accepted Answer · 2017年7月25日 22:35

Your question explicitly states that your are only looking for multiple cars rather than multiple objects, so the answer is in the question. You are not looking for multiple objects, rather multiple occurrences of the same object.

Providing you trained the system well enough to recognise both types of car then they should both be detected using standard cascading filter approaches ... This is like asking how can I detect 2 faces in one photograph?

If you were looking for a car and a monkey then the situation is very different and using common approaches with tools like open CV you would generally train 2 classifiers (one for cars and one for monkeys) the iterate over the image twice.

The more different classes of object you want to detect the more classifiers and iterations you would need.

Neil Slater · Accepted Answer · 2016年8月28日 09:07

Although many solutions in production systems still use a sliding window as described below in this answer, the field of computer vision is moving quickly. Recent advances in this field include R-CNN and YOLO.

Detecting object matches in an image, when you already have an object classifier trained, is usually a matter of brute-force scanning through image patches.

Start with the largest expected patch size. E.g. if your image is 1024 x 768, but always a distance shot of a road maybe you do not expect any car to take up more than 80 x 80 pixels in the image. So you take an 80x80 block of pixels from one corner of the image, and ask your classifier what chance there is a car in that corner. Then take the next patch - perhaps move by 20 pixels.

Repeat for all possible positions, and decide which patches are most likely to contain cars.

Next, take a block size down (maybe 60 x 60, moving 15 pixels at a times) and repeat the same exercise again. Repeat this until you have hit the expected smallest block size for your goal.

Eventually you will have a list of areas within the image, with the probability that each contains a car.

Overlapped blocks both with high probability are most likely the same car, so the logic needs to have thresholds for merging blocks - usually taking the overlapped area with the highest probability score - and declaring there is only one car in that area.

As usual with ML approaches, you will need to experiment with correct meta-params - in this case block sizes, step sizes and rules for merging/splitting areas - in order to get the most accurate results.

SmallChess · Accepted Answer · 2016年4月12日 03:12

I'd want to add @Neil_Slater's answer by sharing my application.

In my application, I want to train a model that can automatically load a chess position from a chess book like this:

Before I did anything, I made sure I had a model that can accurately detect a chess piece.

It was not a hard-problem because it was like training the MINST digits. I collected enough samples, randomly add some noise to those samples. My model was a 2-layer convolutional deep-learning.

Since chess board is always a square. I use square-detection available in OpenCV to give me a list of candidates. I would throw away any square that is too small, too large or not divisible by 64 (since there are 64 squares).
Next, I'd crop the image to something like this:

Now, I have another multi-layer convolutional network to check each square in the board. The stride length is the dimension of the image divided by 8 (since there're eight squares in each dimension). The patch size is the same as the stride length.

My pipework worked as I was able to combine two different classifiers. I personally prefer to train two classifiers, as it'd be easier to train and verify than trying to put everything into a single model.

chewpakabra · Accepted Answer · 2016年4月8日 09:42

The question itself is not quite clear, since you don't state that you have a model that can detect one car per run for an image or you are just asking what tools, algorithms or frameworks to use to detect cars (or other objects) in an image.

Answering second variant, you should be using developed algorithms for object detection, which are either Haar Cascades (which are embedded into OpenCV and there are clear tutorials of how to train your custom object detector, for example, banana tutorial) or CNNs, which are the choice for object detection using neural networks, personally, I enjoy working with that implementation - simple and comprehensive code and amazing results.

Both approaches (Haar Cascades and CNNs) basically find patterns of mutually connected and co-located shapes that describe your particular object (be it face, banana, car or UFO) and use these patterns to find objects on a new image. Mutual inclusion of detected objects (when borders of objects intersect or one is included by another) is used to find best match for every region.

How does deep learning helps in detecting multiple objects in single image?

About