How to choose appropriate epsilon value while approximating gradients to check training?

While approximating gradients, using actual epsilon to shift the weights results in wildly big gradient approximations, as the width of the used approximation triangle is disporportionately small. In Andrew NG-s course, he is using 0.01, but I suppose it's for example purposes only.

This makes me wonder, is there a method to chose the appropriate epsilon value for gradient approximation based on e.g. the current error value of the network?

Topic gradient

Category Data Science


It sounds like the epsilon value is a hyperparameter and the error value is an evaluation metric. Given that, cross-validation can be used to find the epsilon value than minimizes the error value.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.