Combining CNNs for image classification

Question

Combining CNNs for image classification

Andrew

2021年8月2日 14:26

I would like to take the output of an intermediate layer of a CNN (layer G) and feed it to an intermediate layer of a wider CNN (layer H) to complete the inference.

Challenge: The two layers G, H have different dimensions and thus it can't be done directly. Solution: Use a third CNN (call it r) which will take as input the output of layer G and output a valid input for layer H. Then both the weights of layer G and r will be tuned using the loss function:

$$L(W_G, W_r) = MSE(\text{output of layer H}, \text{output of r})$$

My question: Will this method only change the layer G's weights along with r's weights? Does the whole system require finetuning afterwards to update the weights of the other layers?

Topic inference convolutional-neural-network image-classification deep-learning distributed

Category Data Science

Brian Spiering · Accepted Answer · 2021年7月31日 20:47

It is straightforward to stack different neural networks with current deep learning frameworks (i.e., PyTorch or TensorFlow). There is no separate output for $r$, it just one of many layers.

It could look like this:

You can freeze or not freeze any layers in a stacked neural network. You can decide how far to backpropagate the training updates.

The advantages of unfreezing layers is that the model can learn better feature representations. The disadvantages of unfreezing is that training can take longer for very deep neural networks.

noe · Accepted Answer · 2021年7月31日 16:06

First of all: probably you should not train with the loss you propose, because with MSE you will train to minimize the total error, not to keep the features as they were, which is what CNNs are good at detecting; this is the same problem as what happens when you train an image autoencoder on MSE, that you obtain blurry images. Instead, configure the network as you want, reusing layers from the networks you deem appropriate, with the needed adaptation layers, and then train the whole network on the task that you need your network to do (e.g. classification).

When doing so, you can choose to only train some parts of the network, or train them at different learning rates. These are some potential alternatives:

Freeze all the layers except the adaptation ones. The original weights of the reused layers will remain as they originally were.
Train the adaptation layers at a normal learning rate and the other layers at a very small learning rate. This is typical of transfer learning setups and it aims at making the learning more flexible.

Combining CNNs for image classification

About