Combining CNNs for image classification
I would like to take the output of an intermediate layer of a CNN (layer G) and feed it to an intermediate layer of a wider CNN (layer H) to complete the inference.
Challenge: The two layers G, H have different dimensions and thus it can't be done directly. Solution: Use a third CNN (call it r) which will take as input the output of layer G and output a valid input for layer H. Then both the weights of layer G and r will be tuned using the loss function:
$$L(W_G, W_r) = MSE(\text{output of layer H}, \text{output of r})$$
My question: Will this method only change the layer G's weights along with r's weights? Does the whole system require finetuning afterwards to update the weights of the other layers?