Is learning_rate linear with the time to converge using AdamOpt?
Say that both learning rates 1e-3,1e-4
leading to the same solution (not too high or too small).
In terms of convergence by the amount of epochs, does optim.Adam(model.parameters(), lr=1e-3)
compare to optim.Adam(model.parameters(), lr=1e-4)
will take 10 time more epoch?
So if an optimizer with lr=1e-3
reached the solution at epoch 130, theoretically, an optimizer with lr=1e-4
will get there at epoch 1300?
I think that my statement is true in a vanilla SGD, but in Adam's opt there's both momentum and adaptive learning, which will cause non-linear convergence time difference, and probably not the same solution..?
Ill appreciate your insights
Topic sgd learning-rate convergence deep-learning optimization
Category Data Science