Stochastic Gradient Region of Confusion
I have come across the following diagram which explains the behavior of SGD graphically.
Based on this graphical representation, the gradient of the individual data tend to fluctuate more when it closer to the optimum point, where as far away from this point tends to show towards the optimum point.
My question is: Isn't this depends on how we select the points randomly?
For example, lets say we first find the gradient of the graph F3 and finds that it shows a direction towards right, and then if we select the graph F1 and if it shows direction as towards left , in this case can the behavior could be similar to the confusion region?
Thanks a lot!!
Topic sgd gradient-descent
Category Data Science