Dropout on which layers of LSTM?

Using a multi-layer LSTM with dropout, is it advisable to put dropout on all hidden layers as well as the output Dense layers? In Hinton's paper (which proposed Dropout) he only put Dropout on the Dense layers, but that was because the hidden inner layers were convolutional.

Obviously, I can test for my specific model, but I wondered if there was a consensus on this?

Topic stacked-lstm lstm dropout rnn neural-network

Category Data Science


I prefer not to add drop out in LSTM cells for one specific and clear reason. LSTMs are good for long terms but an important thing about them is that they are not very well at memorising multiple things simultaneously. The logic of drop out is for adding noise to the neurons in order not to be dependent on any specific neuron. By adding drop out for LSTM cells, there is a chance for forgetting something that should not be forgotten. Consequently, like CNNs I always prefer to use drop out in dense layers after the LSTM layers.


There is not a consensus that can be proved across all model types.

Thinking of dropout as a form of regularisation, how much of it to apply (and where), will inherently depend on the type and size of the dataset, as well as on the complexity of your built model (how big it is).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.