Is learning_rate linear with the time to converge using AdamOpt?

Question

Is learning_rate linear with the time to converge using AdamOpt?

Adar Cohen

2022年1月10日 12:22

Say that both learning rates 1e-3,1e-4 leading to the same solution (not too high or too small). In terms of convergence by the amount of epochs, does optim.Adam(model.parameters(), lr=1e-3) compare to optim.Adam(model.parameters(), lr=1e-4) will take 10 time more epoch?

So if an optimizer with lr=1e-3 reached the solution at epoch 130, theoretically, an optimizer with lr=1e-4 will get there at epoch 1300? I think that my statement is true in a vanilla SGD, but in Adam's opt there's both momentum and adaptive learning, which will cause non-linear convergence time difference, and probably not the same solution..?

Ill appreciate your insights

Topic sgd learning-rate convergence deep-learning optimization

Category Data Science

Is learning_rate linear with the time to converge using AdamOpt?

About