Skip to content

Latest commit

 

History

History

pg

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

REINFORCE

Deep Deterministic Policy Gradients (DDPG)

Trust-region Policy Optimization (TRPO)

  • Prevents the policy gradient updates from exceeding a KL-divergence bound. Improves learning stability and hyperparameter sensitivity
  • Applied to ALE and Mujoco tasks, outperforms DDPG on many continuous control Mujoco tasks
  • Papers:
  • Blog posts:
    • Kv frans
  • Other

High-dimensional continuous control using generalized advantage estimation

  • Introduces theoretical framework that shows that advantage params determine bias-variance trade-off

Asynchronous Advantage Actor-Critic

ACER