Which Policy Gradient Method was used by Google's Deep Mind to teach AI to walk

Question

Which Policy Gradient Method was used by Google's Deep Mind to teach AI to walk

learner

2021年4月10日 12:58

I just saw this video on Youtube.

Which Policy Gradient method was used to train the AI to walk?

Was it DDPG or D4PG or what?

Topic deepmind policy-gradients reinforcement-learning deep-learning machine-learning

Category Data Science

noe · Accepted Answer · 2021年4月10日 12:58

They used Distributional Proximal Policy Optimization (DPPO). In the article that video is associated to, they provide a brief overview of it:

In order to learn effectively in these rich and challenging domains, it is necessary to have a reliable and scalable reinforcement learning algorithm. We leverage components from several recent approaches to deep reinforcement learning. First, we build upon robust policy gradient algorithms, such as trust region policy optimization (TRPO) and proximal policy optimization (PPO) [7, 8], which bound parameter updates to a trust region to ensure stability. Second, like the widely used A3C algorithm [2] and related approaches [3] we distribute the computation over many parallel instances of agent and environment. Our distributed implementation of PPO improves over TRPO in terms of wall clock time with little difference in robustness, and also improves over our existing implementation of A3C with continuous actions when the same number of workers is used.

Here are some resources:

The deepming blog post describing the method
The original article: Emergence of locomotion behaviours in rich environments

Which Policy Gradient Method was used by Google's Deep Mind to teach AI to walk

About