What are features in computer vision?

I'm learning how U-NET network works to do semantic segmentation.

I think I have understood everything but features. What are those image features?

I read that convolutional layers extract features from the images using their filters, but what are they? Are they corners? edges? colours?

I have read this article, "Finding Features", but I think I need more information about them.

Topic image-segmentation convolution computer-vision deep-learning

Category Data Science


In addition to the previous answer, I would add that many convolutional NN architectures (and not only convolutional!), are effectively contraction mappings of data points from input space to the targets space (e.g. 10 in MNIST, 1000 in ImageNet).

The idea is to transform spatial information of an image (which is hard to classify using linear layers) into channels (i.e. features), in such a way that the latter representation is "good". So good, that even couple of simple dense layers are enough to solve the downstream task.


CNNs like U-Net extract lower level features like edges on lower layers (i.e. the first convolutional layers) and higher level features on higher layers (i.e. convolutional layers closer to the final linear layers). This principle is losely inspired by how visual perception is implemented in the Visual Cortex among humans (and other animals).

In a CNN the feature maps could for example look like this:

CNN feature maps

As you can see the lower level feature maps detect simple structures like edges while higher level feature maps recognize more complex structures like eyes or faces.

Colors, however, are processed differently by CNNs than spatial features since color images are usually fed to CNNs using three input channels (one for each color in RGB format). So colors are not detected in the same way as spatial features but instead the first convolutional layer receives a 3-dimensional input image with one dimension for each color-component (one for red, one for green, one for blue).

The article What exactly does CNN see? and the paper Visualizing and Understanding Convolutional Networks (which are also the sources of above images) provide a more detailed explanation.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.