Vanishing gradient problem even after existence of ReLu function?
Let's say I have a deep neural network with 50 hidden layers and at each neuron of hidden layer the ReLu activation function is used. My question is
- Is it possible for vanishing gradient problem to get occur during the backpropogation for weights updates even after the existence of relu?
- or we can say that vanishing gradient problem will never occur when all the activation functions are ReLu?
Topic cnn gradient-descent deep-learning neural-network
Category Data Science