Is it wrong to use Glorot Initialization with ReLu Activation?

I'm reading that keras' default initialization is glorot_uniform.

However, all of the tutorials I see are using relu activation as the go-to for hidden layers, yet I do not see them specifying initialization for those layers as he.

Would it be better for these relu layers to use he instead of glorot?

As seen in OReilly's Hands-On Machine Learning with Scikit-Learn Tensorflow:

| initialization | activation                    | 
+----------------+-------------------------------+
| glorot         | none, tanh, logistic, softmax | 
| he             | relu  variants               |
| lecun          | selu                          |

Topic weight-initialization activation-function keras deep-learning neural-network

Category Data Science


As a general answer for hyperparameter tuning, you have to try both and see what works better for your problem. I suspect that some (if not most) of general tuning rule have been observed on a given problem / with a given architecture. (for exemple the He paper is about vision, including convolutional layers).

As for keras choice, sometimes, for practical reasons, it is easier to implement an unique default option than to adapt the default option to the activation. Given your Hends-on machine learning citation, it's not hard to see why they would implement Glorot as the default.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.