Tuning Batch size and Learning rate in neural net
The following MCQ question is provided in Exam Readiness: AWS Certified Machine Learning - Specialty document. The correct answer has been marked in the document but I am not able to understand why this option is correct.
Question: A data scientist is working on optimizing a model during the training process by varying multiple parameters. The data scientist observes that, during multiple runs with identical parameters, the loss function converges to different, yet stable, values. What should the data scientist do to improve the training process?
A. Increase the learning rate. Keep the batch size the same. [REALISTIC DISTRACTOR]
B. Reduce the batch size. Decrease the learning rate. [CORRECT]
C. Keep the batch size the same. Decrease the learning rate. [REALISTIC DISTRACTOR]
D. Do not change the learning rate. Increase the batch size. [REALISTIC DISTRACTOR]
My understanding of the problem is that after every run the optimizer is getting stuck in different local minimas. In that case reducing batch size will add randomness and will avoid local minima. But how does decreasing learning help?
May be a large learning rate will make it wiggle too much(given small batch size)..... but still decreasing learning rate will increase the probability of hitting local minima.