Computing variance of an SGD iteration

It is known that SGD iteration has huge variance. Given the iteration update: $$ w^{k+1} := w^k - \underbrace{\alpha \ g_i(w^k)}_{p^k}, $$ where $w$ are model weights and $g_i(w^k)$ is gradient of loss function evaluated for sample $i$. How do I compute variance of each update $p^k$? I would like to plot it for each iteration and study its behavior during minimization process.

Topic mathematics variance deep-learning optimization machine-learning

Category Data Science


You could plot a graph of update versus iteration and analyze the variation of each update as the number of iteration increases. Like in here, where they are comparing the variance of the standard gradient descent algorithm versus its stochastic version.

enter image description here

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.