Image Classification on non real images

I was wondering how image classifier networks perform on images that are not photographs. For example, if you were to feed a drawing of a car or a face to an image classifier that was only trained on photos would the network still be able to classify the image correctly?

Furthermore, what if you were to feed more and more abstract drawings into the network. As humans, we are able to recognize objects even in abstract forms (i.e., modern art) but do current image classifiers generalize well enough in order to do this?

Are there any networks that are also trained on artist renditions of objects and not just photographs?

Topic image image-recognition image-classification classifier

Category Data Science


Both answers are good. The opposite happens a lot, if you look for "eye pupil detection from synthesis" in Google you will find a lot of papers using Unity Eyes to train different models. Particularly I like the work of Chao Gou in the matter:

In Learning-by-Synthesis for Accurate Eye Detection, he trains a model in pure synthetic data and shows remarkable results and compares to the use of part synthetic and part real data.

Also, there is work from this and other authors using GANs to improve synthetic data to reach better results. See for example A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation, Rendering of Eyes for Eye-Shape Registration and Gaze Estimation and Cascade learning from adversarial synthetic images for accurate pupil detection

Is important to note that noise and other real photography distortion can drop CNNs accuracy by a bunch if not trained to face it.


As a rule of thumb, the data distribution of your test set should be of the same nature as the one in the train set.

So for example if you have a network that classifies cats and dogs and you trained it with super clean and good images and then you try to feed it fuzzy images done with crappy phones, you might be surprised about the results... performance will most likely drop.

In your case, the same will happen, a draw of a dog is of a completely different distribution when compared with a picture of a dog, so the network will most likely not perform as well as expected.

That said, CNNs can actually learn low-level features (borders and so son), so sure, you will still get some sort of accuracy, but certainly not very good.

There is a great article in the Keras blog regarding this, I strongly recommend you to read it.


This question is slightly philosophical, but can be explained in this way - If your model is trained on real photographs it will likely not generalize well to things like drawings unless they are photorealistic or contain the features that the model is using to classify the images. You would likely have to include drawings in your training data to be able to do this effectively.

An interesting project leveraging Machine Learning that may have some relevance to what you are talking about is a recent Google project called "Quick Draw!" (link here) You should check it out. Essentially, it takes the input of human drawings and outputs a classification label for what it thinks you drew.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.