How is error back-propagated in a multi-layer RNN

Let's say I have a 2 layer LSTM cell, and I'm using this network to perform regression for input sequences of length 10 along the time axis.

From what I understand, when this network is 'unfolded', it will consist of 20 LSTM cells, 10 for each layer. So the 10 cells corresponding to the first layer receive the network input for t = 1 to 10, whereas the 10 cells corresponding to the second layer receive the first layer's output for t = 1 to 10. In other words, the output from the cell in layer 1 corresponding to t = 1 goes to (1) the 'next' cell in layer 1 corresponding to t = 2, and (2) the cell in layer 2 corresponding to t = 1.

So when the error is back-propagated, will there not be two derivatives coming into each cell in layer 1? If so, how is weight update performed? Is the sum or mean of both derivatives used or is there something else going on?

Topic stacked-lstm backpropagation deep-learning neural-network machine-learning

Category Data Science


As you can see here, derivatives will be propagated by the chain rule although they are stacked. Actually, there will be two main paths. The first one will be backpropagation through time and the next one will be the backpropagation from the output of each unrolled cell which can directly be connected to the output or can be connected to the stacked unrolled cells. Also, take a look at here.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.