Why can't we use linear activation function in hidden layers?

Question

Why can't we use linear activation function in hidden layers?

Sahil Lohiya

2022年5月31日 13:17

I read a few articles which were stating that we need to add nonlinearity but it wasn't clear why we need nonlinearity and why can't we use linear activation function in hidden layers.

kindly keep math light, intuitive answers.

Topic deep-learning neural-network machine-learning

Category Data Science

Nikos M. · Accepted Answer · 2022年5月29日 16:55

If linearities are used in every layer, effectively we end up with a linear (regression) model which is restricted in what it can represent (only linear relationships).

On the other hand the universal approximation theorem for neural networks assumes general (non-linear) functions which can in principle represent anything, including non-linear relationships.

So it is not that it is forbidden, it is simply a poor use of the complexity of neural networks to function as linear models since there are already other linear models to do that that are much simpler to train. Thus NNs are used mainly for general non-linear tasks and thus have to include non-linearities in their design (if they are to function as such).

Why can't we use linear activation function in hidden layers?

About