Is saddle point a cause for the vanishing gradient problem

Question

Is saddle point a cause for the vanishing gradient problem

siegfried

2021年9月15日 02:53

I am a beginner to neural networks and I am writing a report summarising on the causes and solutions to the vanishing gradient problem. From what I have read, the 2 main causes are the repeated multiplication of saturated activation function derivatives and repeated multiplication of large weights from bad initialisation. I tend to consider both of them as the poor choice of neural network components, leading to computational troubles.

Additionally the proliferation of saddle points on the cost function surface in high-dimensional problems is another potential cause for zero gradient. However I am not entirely sure if I should include it as one of the causes for the vanishing gradient problem. Because it sounds like the nature of non-convex cost function, as being attractive to the gradient descent direction.

It will be greatly appreciated if someone can offer some structured ideas on this topic. Thanks in advance.

Topic weight-initialization gradient-descent

Category Data Science

Is saddle point a cause for the vanishing gradient problem

About