Gradient flow through concatenation operation

I need help in understanding the gradient flow through a concatenation operation.

I'm implementing a network (mostly a CNN) which has a concatenation operation (in pytorch). The network is defined such that the responses of passing two different images through a CNN are concatenated and passed through another CNN and the training is done end to end.

Since the first CNN is shared between both of the inputs to the concatenation, I was wondering how the gradients should be distributed through the concatenation operation during backprop? I'm not an expert on backprop and this is the first time I'm tinkering with a custom backward implementation so any pointers would be helpful.

I can provide more details if you guys need it.

Topic convolutional-neural-network backpropagation computer-vision deep-learning

Category Data Science


For concatenation, the gradient values during back propagation split to their respective source layers. There is no direct interaction between gradients in either of the source layers.

The layer immediately after the concatenated layer does interact with both networks, and it will have some weight parameters that multiply outputs from network A and some that multiply outputs from network B. There will not be any parameters that multiply outputs from both layers (unless you are forcing them to be the same through weight sharing, but that won't be the case if for example you are stacking features from both starting networks).

The only issue you might have is clearly identifying which parameters link to each original network. That is an implementation detail, so you would need to share your code so far in order to debug that if it goes wrong.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.