How exactly do you implement SGD with momentum?

Bersan

2021年9月28日 14:48

I am looking up sources to implement SGD with momentum, but they are giving me different equations.

(beta is the momentum hyper-parameter, weights[l] is a matrix of weights for layer l, gradients[l] are the gradients for layer l, etc)

This source gives:

v[l] = beta*v[l] - learning_rate*gradients[l]
weights[l] = weights[l] + v[l]

But this source gives:

v[l] = beta*v[l] + learning_rate*gradients[l]
weights[l] = weights[l] - v[l]

Are they equivalent?

Also, does it matter if beta + learning_rate != 1? (In this case this would be different from the exponential moving average equation, where they sum to 1).

Topic sgd implementation

Category Data Science

How exactly do you implement SGD with momentum?

About