Computing variance of an SGD iteration

Question

Computing variance of an SGD iteration

user93607

2022年3月16日 04:07

It is known that SGD iteration has huge variance. Given the iteration update: $$ w^{k+1} := w^k - \underbrace{\alpha \ g_i(w^k)}_{p^k}, $$ where $w$ are model weights and $g_i(w^k)$ is gradient of loss function evaluated for sample $i$. How do I compute variance of each update $p^k$? I would like to plot it for each iteration and study its behavior during minimization process.

Topic mathematics variance deep-learning optimization machine-learning

Category Data Science

Guilherme Marques · Accepted Answer · 2020年4月2日 20:59

You could plot a graph of update versus iteration and analyze the variation of each update as the number of iteration increases. Like in here, where they are comparing the variance of the standard gradient descent algorithm versus its stochastic version.

Computing variance of an SGD iteration

About