Why can't we use linear activation function in hidden layers?

I read a few articles which were stating that we need to add nonlinearity but it wasn't clear why we need nonlinearity and why can't we use linear activation function in hidden layers.

kindly keep math light, intuitive answers.

Topic deep-learning neural-network machine-learning

Category Data Science


If linearities are used in every layer, effectively we end up with a linear (regression) model which is restricted in what it can represent (only linear relationships).

On the other hand the universal approximation theorem for neural networks assumes general (non-linear) functions which can in principle represent anything, including non-linear relationships.

So it is not that it is forbidden, it is simply a poor use of the complexity of neural networks to function as linear models since there are already other linear models to do that that are much simpler to train. Thus NNs are used mainly for general non-linear tasks and thus have to include non-linearities in their design (if they are to function as such).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.