Isn't the optimizer network in deepminds learning to learn a DRQN?

Question

Isn't the optimizer network in deepminds learning to learn a DRQN?

Amey Agrawal

2018年3月6日 18:08

In the paper "Learning to learn by gradient descent by gradient descent" they describe an RNN which learns gradient transformation to learn an optimizer.

The optimizer network directly interacts with the environment to take actions,

$\theta_{t+1} = \theta_t + g_t(∇f(\theta_t), \phi).$ (Equation 1 from the paper)

and hence feels like a reinforcement learning problem in continuous action space. The formulation of optimization equation looks like what one would typical do in a supervised learning problem,

$L(\phi) = E_f[f(\theta^*(f,\phi))]$ (Equation 2 from the paper)

Is this an indirect formulation of policy gradient?

Topic meta-learning q-learning deep-learning neural-network

Category Data Science

Isn't the optimizer network in deepminds learning to learn a DRQN?

About