Dropout on which layers of LSTM?
Using a multi-layer LSTM
with dropout, is it advisable to put dropout on all hidden layers as well as the output Dense layers? In Hinton's paper (which proposed Dropout) he only put Dropout on the Dense layers, but that was because the hidden inner layers were convolutional.
Obviously, I can test for my specific model, but I wondered if there was a consensus on this?
Topic stacked-lstm lstm dropout rnn neural-network
Category Data Science