Skip to content

Latest commit

 

History

History
25 lines (23 loc) · 978 Bytes

README.md

File metadata and controls

25 lines (23 loc) · 978 Bytes

policy-value-methods

My implementation on bunch of policy value methods from scratch

Algorithms:

  1. Hill Climb
  2. Cross Entropy Method
  3. Policy Gradient Methods
    1. REINFORCE
    2. PPO (Proximal Policy Optimization) Video
    3. Actor Critic

Results:

LunarLander (REINFORCE) {Solved in 519 episodes}

BipedalWalker-v3 (TD3) {completion time ~14seconds, achieved after 500 episodes}

Score

Rolling score