why zero centring of data from activation function is good for deep nural network?

I was reading an article that mentioned reasons why tanh is better than sigmoid and one reason was that tanh gives zero-centered data but I couldn't understand why and how it will affect our network.

kindly give math light, intuitive answers.

Topic activation-function normalization deep-learning statistics machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.