What is the reason behind Keras choice of default (recurrent) activation functions in LSTM networks

Activation function between LSTM layers

In the above link, the answer to the question whether activation function are required for LSTM layers was answered as follows: as an LSTM unit already consists of multiple non-linear activation functions, it is not necessary to use a (recurrent) activation function.

My question: Is there a specific reason why Keras by default uses a tanh activation and sigmoid recurrent_activation if those activations are not necessary? I mean, for a Dense layer the default activation is none. Keras could just have used none as default for LSTM units as well, right? Could it be that Keras uses these activations for a reason? Also, a lot of tutorials or blogs use ReLu (without clarifying why), and I have not come across a one specifying none as (recurrent) activation. Why is ReLu used so much (while the outputs from the LSTM unit are already activated)?

Topic stacked-lstm lstm activation-function keras

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.