- Introduces the REINFORCE policy gradient algorithm, which directly learns a parameterized policy using error back-propagation.
- Used to solve a variety of simple, low-dimensional discrete and continuous control tasks.
- Papers:
- Blog posts:
- Like REINFORCE, DDPG directly learns a parameterized policy in addition to a value function. However, unlike the stochastic policy gradient, it uses a deterministic policy gradient
- Applied to ALE and Mujoco tasks
- Papers:
- DDPG
- Continuous control
- Blog posts:
- Prevents the policy gradient updates from exceeding a KL-divergence bound. Improves learning stability and hyperparameter sensitivity
- Applied to ALE and Mujoco tasks, outperforms DDPG on many continuous control Mujoco tasks
- Papers:
- Blog posts:
- Kv frans
- Other
- Introduces theoretical framework that shows that advantage params determine bias-variance trade-off
- Papers:
- Blog posts:
- http://www.allinea.com/blog/201607/tuning-deep-learning-episode-1-deepminds-a3c-torch
- Kaixhin/Atari#5
- https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner
- https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2#.gtkdbo3a
- https://jerrybai1995.github.io/2016-11-30-doom-ai/
- Papers: