Importance/intuition behind stacking RNNs

Nowadays there's a trend towards using architectures of "deep" RNNs i.e. vertically stacked RNNs. RNN chapter from Bengio's bookThese networks seem to work well in practice.

What's the intuition around using vertically stacked layers of RNNs (beyond the obvious fact that they increase the capacity by increasing the # parameters)?

Topic rnn deep-learning

Category Data Science


All neural networks can increase expressiveness and representational capacity by stacking layers. Each later layer can learn to non-linearly weigh the earlier layers. These non-linearities allow any function to be approximated. In the case of Recurrent Neural Network (RNN), it is functions over time. Stacked RNNs have increased abilities to learn functions over time.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.