This is simply a special case (where $p_\psi = N(0,1)$) of the general gradient estimator for Natural Evolution Strategies (proved in another reference, look it up):

enter image description here

Outline of derivation based on the general formula for the gradient estimator:

$$\nabla_\psi E_{\theta \sim p_\psi} \left[ F(\theta) \right] = E_{\theta \sim p_\psi} \left[ F(\theta) \nabla_\psi log({p_\psi}(\theta)) \right]$$

If

$$\epsilon \sim \mathbb{N}(0, 1) = \frac{1}{\sqrt{2 \pi}}e^{-\frac{\epsilon^2}{2}}$$

then

$$\psi = \theta + \sigma \epsilon \sim \mathbb{N}(\theta, \sigma) = \frac{1}{\sigma\sqrt{2 \pi}}e^{-\frac{(\psi-\theta)^2}{2\sigma^2}}$$

Thus: $\psi = \theta + \sigma \epsilon \sim \mathbb{N}(\theta, \sigma) \Longleftrightarrow \epsilon = \frac{\psi-\theta}{\sigma} \sim \mathbb{N}(0,1)$

So:

$$\begin{align} \nabla_\theta E_{\psi \sim N(\theta,\sigma)} \left[ F(\theta + \sigma \epsilon) \right] &= E_{\psi \sim N(\theta,\sigma)} \left[ F(\theta + \sigma \epsilon) \nabla_\theta (-\frac{(\psi-\theta)^2}{2\sigma^2}) \right] \\ &= E_{\epsilon \sim N(0,1)} \left[ F(\theta + \sigma \epsilon) \nabla_\epsilon (-\frac{\epsilon^2}{2}) \frac{d(\frac{\psi-\theta}{\sigma})}{d\theta} \right] \\ &= \frac{1}{\sigma} E_{\epsilon \sim N(0,1)} \left[ F(\theta + \sigma \epsilon) \epsilon \right] \\ &= \nabla_\theta E_{\epsilon \sim N(0,1)} \left[ F(\theta + \sigma \epsilon) \right] \end{align}$$

note: scalar variables were considered in above steps for simplicity, but easy to extend/derive for vector variables

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.