How to choose appropriate epsilon value while approximating gradients to check training?
While approximating gradients, using actual epsilon to shift the weights results in wildly big gradient approximations, as the width of the used approximation triangle is disporportionately small. In Andrew NG-s course, he is using 0.01, but I suppose it's for example purposes only.
This makes me wonder, is there a method to chose the appropriate epsilon value for gradient approximation based on e.g. the current error value of the network?
Topic gradient
Category Data Science