Which Policy Gradient Method was used by Google's Deep Mind to teach AI to walk
I just saw this video on Youtube.
Which Policy Gradient method was used to train the AI to walk?
Was it DDPG or D4PG or what?
I just saw this video on Youtube.
Which Policy Gradient method was used to train the AI to walk?
Was it DDPG or D4PG or what?
They used Distributional Proximal Policy Optimization (DPPO). In the article that video is associated to, they provide a brief overview of it:
In order to learn effectively in these rich and challenging domains, it is necessary to have a reliable and scalable reinforcement learning algorithm. We leverage components from several recent approaches to deep reinforcement learning. First, we build upon robust policy gradient algorithms, such as trust region policy optimization (TRPO) and proximal policy optimization (PPO) [7, 8], which bound parameter updates to a trust region to ensure stability. Second, like the widely used A3C algorithm [2] and related approaches [3] we distribute the computation over many parallel instances of agent and environment. Our distributed implementation of PPO improves over TRPO in terms of wall clock time with little difference in robustness, and also improves over our existing implementation of A3C with continuous actions when the same number of workers is used.
Here are some resources:
Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.