Understanding intution behind sigmoid curve in the context of back propagation
I was trying to understand significance of S-shape of sigmoid / logistic function. The slope/derivative of sigmoid approaches zero for very large and very small input values. That is $σ'(z) ≈ 0$ for $z 10$ or $z -10$. So update to weights will be smaller. Whereas updates will be bigger when $z$ is not too big or too small.
I dont get why its significant to have smaller updates when $z$ is too big and too small and bigger updates for not too big / not too small $z$. One reasoning I read is that it squashes outliers. But how very large and very small $z=wx+b$ indicate corresponding $x$ are outliers?
Also I was not able to map sigmoid derivative curve (in blue) to gradient descent curve below. Do these two curves relate to each other in any way? Should very large and very small $z$ in sigmoid curve coincide with global minima in the middle of GD curve?
Topic sigmoid backpropagation gradient-descent logistic-regression
Category Data Science