Objects Localization Through CNN

I am new to deep learning and tensor flow and I am trying to train a CNN at localizing digits in the Street View House Numbers data set. To this end I have an input set of 32x32 images and, since I want to recognize up to 5 digits, I am using as labels vectors of 20 elements like this

[top_x_digit1,top_y_digit1,widht_digit1,height_digit1,top_x_digit2, etc..]

0,0,0,0 when there is no digit

As far as I understand, after (let me say) 3 layers of convolution and pooling I can add 5 (parallel) fully connected layers aimed at extracting each the box features of a different digits (when present, 0 0 0 0 otherwise).

Is my approach correct?

Topic convolutional-neural-network tensorflow object-recognition deep-learning

Category Data Science


Technically this is possible but most approaches combine localization with some sort of classification so that the network can get better at scoring regions, and also removes the problem of having to use the [0, 0, 0, 0] vector to encode a null region since if the classifier associated with the region gives a low probability for that object you can ignore the bounding box. That way the bounding box regression can focus on finding regions that look like they contain objects, not their type. Note that the classifier needs to be trained so that it recognizes a non-object region (ie, it must have a null/garbage/background class).

See for example:

Object localization and detection

Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

(I don't have enough reputation to post more links but you should also google the R-CNN paper "Rich feature hierarchies for accurate object detection and semantic segmentation")

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.