Training a floor detection model: use full room images or only the cropped floor?

I'm trying to build a floor type image classification model.There's an open dataset called OpenSurfaces containing images segmented by the material type of every item appearing on a room.

Something like this:

I thought that using this dataset to train a floor detection model would be a good thing, so I wrote a script to extract the materials I'd like to detect (wood, tile, carpet, marble, stone, ...). These are some examples of the images I've got as a result of the script:

Wood material:

Tile:

Carpet:

Then I trained a CNN but I've only got something like 70% of accuracy and I don't really now if I'm going along the right path. Is it better to train a model with the pictures I extracted or it'd be better to train it with the full room image, not the segmented part?

I'm quite lost, so any guidance will be greatly appreciated.

Topic convolution image-classification deep-learning

Category Data Science


It seems the database has some very partial / wrong information that may 'confuse' the ML - see example for 'tile' instance below which is clearly wrong (refer to the small yellow triangle marked #1).

A good data set must be consistent in what you are trying to 'teach' the machine to classify.

How did you filter the training / val set for your learning process?

enter image description here


I think you might be over-thinking and over-engineering this one. The answer is simple: You should always train on data that will most closely resemble the data during prediction cycles.

In your case, depending on how you architect this, you might do an "extended" pre-processing period where you get take images, find the floor first and then classify the texture finishes (or whatever). So it may actually be a combination of multiple algorithms, but I'll leave that for you to decide.

Regardless, you should always be thinking about what your input data looks like - at prediction time, not modeling - and proceed accordingly.


It isn't clear what you are trying to do (floor detection or just classification).

If your task is to classify different types of floor, you could use just the segmented parts. But if you want to detect the floor and classify it, you have to give your model some negative examples as well (it needs to learn how not-floor looks like as well).

However, in general it would be better practice to train the model on full images. Even if it is just a floor classification task, the context of the image can also help (kitchen increases the probability of tiles, bedroom increases the probability of a wooden floor, etc.).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.