Objects Localization Through CNN
I am new to deep learning and tensor flow and I am trying to train a CNN at localizing digits in the Street View House Numbers data set. To this end I have an input set of 32x32 images and, since I want to recognize up to 5 digits, I am using as labels vectors of 20 elements like this
[top_x_digit1,top_y_digit1,widht_digit1,height_digit1,top_x_digit2, etc..]
0,0,0,0 when there is no digit
As far as I understand, after (let me say) 3 layers of convolution and pooling I can add 5 (parallel) fully connected layers aimed at extracting each the box features of a different digits (when present, 0 0 0 0 otherwise).
Is my approach correct?