Isn't the optimizer network in deepminds learning to learn a DRQN?

In the paper "Learning to learn by gradient descent by gradient descent" they describe an RNN which learns gradient transformation to learn an optimizer.

The optimizer network directly interacts with the environment to take actions,

$\theta_{t+1} = \theta_t + g_t(∇f(\theta_t), \phi).$ (Equation 1 from the paper)

and hence feels like a reinforcement learning problem in continuous action space. The formulation of optimization equation looks like what one would typical do in a supervised learning problem,

$L(\phi) = E_f[f(\theta^*(f,\phi))]$ (Equation 2 from the paper)

Is this an indirect formulation of policy gradient?

Topic meta-learning q-learning deep-learning neural-network

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.