deep-rl-paper-notes/pg at master · evancasey/deep-rl-paper-notes

README.md

Introduces the REINFORCE policy gradient algorithm, which directly learns a parameterized policy using error back-propagation.
Used to solve a variety of simple, low-dimensional discrete and continuous control tasks.
Papers:
- Policy Gradient Methods for Reinforcement Learning with Function Approximation
- Reinforcement learning of motor skills with policy gradients
Blog posts:
- http://www.rage.net/~greg/2016-07-05-ActorCritic-with-OpenAI-Gym.html

Prevents the policy gradient updates from exceeding a KL-divergence bound. Improves learning stability and hyperparameter sensitivity
Applied to ALE and Mujoco tasks, outperforms DDPG on many continuous control Mujoco tasks
Papers:
- TRPO
Blog posts:
- Kv frans
Other
- openai/requests-for-research#22 (comment)

Introduces theoretical framework that shows that advantage params determine bias-variance trade-off