Gradient flow through concatenation operation
I need help in understanding the gradient flow through a concatenation operation.
I'm implementing a network (mostly a CNN) which has a concatenation operation (in pytorch). The network is defined such that the responses of passing two different images through a CNN are concatenated and passed through another CNN and the training is done end to end.
Since the first CNN is shared between both of the inputs to the concatenation, I was wondering how the gradients should be distributed through the concatenation operation during backprop? I'm not an expert on backprop and this is the first time I'm tinkering with a custom backward implementation so any pointers would be helpful.
I can provide more details if you guys need it.