Is learning_rate linear with the time to converge using AdamOpt?

Say that both learning rates 1e-3,1e-4 leading to the same solution (not too high or too small). In terms of convergence by the amount of epochs, does optim.Adam(model.parameters(), lr=1e-3) compare to optim.Adam(model.parameters(), lr=1e-4) will take 10 time more epoch?

So if an optimizer with lr=1e-3 reached the solution at epoch 130, theoretically, an optimizer with lr=1e-4 will get there at epoch 1300? I think that my statement is true in a vanilla SGD, but in Adam's opt there's both momentum and adaptive learning, which will cause non-linear convergence time difference, and probably not the same solution..?

Ill appreciate your insights

Topic sgd learning-rate convergence deep-learning optimization

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.