Skip to content

Latest commit

 

History

History
49 lines (42 loc) · 2.47 KB

README.md

File metadata and controls

49 lines (42 loc) · 2.47 KB

REINFORCE

Deep Deterministic Policy Gradients (DDPG)

Trust-region Policy Optimization (TRPO)

  • Prevents the policy gradient updates from exceeding a KL-divergence bound. Improves learning stability and hyperparameter sensitivity
  • Applied to ALE and Mujoco tasks, outperforms DDPG on many continuous control Mujoco tasks
  • Papers:
  • Blog posts:
    • Kv frans
  • Other

High-dimensional continuous control using generalized advantage estimation

  • Introduces theoretical framework that shows that advantage params determine bias-variance trade-off

Asynchronous Advantage Actor-Critic

ACER