Dropout on which layers of LSTM?

Question

Dropout on which layers of LSTM?

BigBadMe

2018年9月13日 20:04

Using a multi-layer LSTM with dropout, is it advisable to put dropout on all hidden layers as well as the output Dense layers? In Hinton's paper (which proposed Dropout) he only put Dropout on the Dense layers, but that was because the hidden inner layers were convolutional.

Obviously, I can test for my specific model, but I wondered if there was a consensus on this?

Topic stacked-lstm lstm dropout rnn neural-network

Category Data Science

Green Falcon · Accepted Answer · 2018年9月13日 13:35

I prefer not to add drop out in LSTM cells for one specific and clear reason. LSTMs are good for long terms but an important thing about them is that they are not very well at memorising multiple things simultaneously. The logic of drop out is for adding noise to the neurons in order not to be dependent on any specific neuron. By adding drop out for LSTM cells, there is a chance for forgetting something that should not be forgotten. Consequently, like CNNs I always prefer to use drop out in dense layers after the LSTM layers.

n1k31t4 · Accepted Answer · 2018年9月13日 13:30

There is not a consensus that can be proved across all model types.

Thinking of dropout as a form of regularisation, how much of it to apply (and where), will inherently depend on the type and size of the dataset, as well as on the complexity of your built model (how big it is).

Dropout on which layers of LSTM?

About