Why can't we use linear activation function in hidden layers?
I read a few articles which were stating that we need to add nonlinearity but it wasn't clear why we need nonlinearity and why can't we use linear activation function in hidden layers.
kindly keep math light, intuitive answers.
Topic deep-learning neural-network machine-learning
Category Data Science