CNN for image classification with two outputs

Is it possible to classify my images (cars parts) by the type of cars part(door, window ...) and also by the view of the image( front, back, right, left, top and bottom). My pictures are labelled like this: View_idPart, the view is a number from 2 to 7. I want to use a CNN model, but i don't know if this is possible? I hope that I will have some answers, I will be so grateful

Topic image cnn classification python

Category Data Science


If you need a tutorial on how to code the classification algorithm check out this article.


The view classification (front, back, ...) is the easier part as you have the correct labels in your dataset.

I would do it using transfer learning : pick a pre existing image classification model (such as VGG or Res-Net) and freeze it (parameters.require_grad = false in pytorch layer.trainable = false in keras), remove the last classification layers and replace them with your architecture with the correct number of outputs (6 classes in your case). Then train the network : it should only train the last classification part as we froze the convolutional part. And it should give good results depending on how complicated the initial CNN model is.

Transfer learning is only useful if you do not have much data in your dataset, i'd say it is not necessary to use a pre existing model if you have a lot of data (>100 000) in the dataset. If you have a big dataset, you can create your architecture from scratch and build your convolutional layers with an encoder-classifier architecture as represented below (blue + red are the convolutional layers and green is the classifier part) :

Typical Encoder-Classifier Architecture, VGG 16

I'm not sure to understand what you mean by type of cars part(door, window ...), it seems to me you want to have another classifier that gives the car parts on the image, so the output should be something like : there is window, a door, but no trunk. This is also possible, but requires a dataset labeled with such information (which does not seem to be the case). Maybe you can find correlations between the view and the parts on the image and deduce it from there. Or if you have such a labeled dataset, just create another CNN and train it on this dataset (output may need to be slighty different if you want to predict different classes for one image).

Anyway, CNN is definitly the way to go for your problem as it is the most accurate models we have at the moment for image classification, and they are likely to perform way better than fully connected layers or other architectures.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.