why zero centring of data from activation function is good for deep nural network?
I was reading an article that mentioned reasons why tanh is better than sigmoid and one reason was that tanh gives zero-centered data but I couldn't understand why and how it will affect our network.
kindly give math light, intuitive answers.