The 37 Implementation Details of Proximal Policy Optimization

This repo contains the source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Blog post url: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
Tracked Weights and Biases experiments: https://wandb.ai/vwxyzjn/ppo-details

If you like this repo, consider checking out CleanRL (https://github.com/vwxyzjn/cleanrl), the RL library that we used to build this repo.

Get started

Prerequisites:

Python 3.8+
Poetry

Install dependencies:

poetry install

Train agents:

poetry run python ppo.py

Train agents with experiment tracking:

poetry run python ppo.py --track --capture-video

Atari

Install dependencies:

poetry install -E atari

Train agents:

poetry run python ppo_atari.py

Train agents with experiment tracking:

poetry run python ppo_atari.py --track --capture-video

Pybullet

Install dependencies:

poetry install -E pybullet

Train agents:

poetry run python ppo_continuous_action.py

Train agents with experiment tracking:

poetry run python ppo_continuous_action.py --track --capture-video

Gym-microrts (MultiDiscrete)

Install dependencies:

poetry install -E gym-microrts

Train agents:

poetry run python ppo_multidiscrete.py

Train agents with experiment tracking:

poetry run python ppo_multidiscrete.py --track --capture-video

Train agents with invalid action masking:

poetry run python ppo_multidiscrete_mask.py

Train agents with invalid action masking and experiment tracking:

poetry run python ppo_multidiscrete_mask.py --track --capture-video

Atari with Envpool

Install dependencies:

poetry install -E envpool

Train agents:

poetry run python ppo_atari_envpool.py

Train agents with experiment tracking:

poetry run python ppo_atari_envpool.py --track

Solve Pong-v5 in 5 mins:

poetry run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3

400 game scores in Breakout-v5 with PPO in ~1 hour (side-effects-free 3-4x speed up compared to ppo_atari.py with SyncVectorEnv):

poetry run python ppo_atari_envpool.py --gym-id Breakout-v5

Procgen

Install dependencies:

poetry install -E procgen

Train agents:

poetry run python ppo_procgen.py

Train agents with experiment tracking:

poetry run python ppo_procgen.py --track

Reproduction of all of our results

To reproduce the results run with openai/baselines, install our fork at hhttps://github.com/vwxyzjn/baselines. Then follow the scripts in scripts/baselines. To reproduce our results, follow the scripts in scripts/ours.

Citation

@inproceedings{shengyi2022the37implementation,
  author = {Huang, Shengyi and Dossa, Rousslan Fernand Julien and Raffin, Antonin and Kanervisto, Anssi and Wang, Weixun},
  title = {The 37 Implementation Details of Proximal Policy Optimization},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/},
  url  = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scripts		scripts
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
ppo.py		ppo.py
ppo_atari.py		ppo_atari.py
ppo_atari_envpool.py		ppo_atari_envpool.py
ppo_atari_lstm.py		ppo_atari_lstm.py
ppo_continuous_action.py		ppo_continuous_action.py
ppo_multidiscrete.py		ppo_multidiscrete.py
ppo_multidiscrete_mask.py		ppo_multidiscrete_mask.py
ppo_procgen.py		ppo_procgen.py
ppo_shared.py		ppo_shared.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The 37 Implementation Details of Proximal Policy Optimization

Get started

Atari

Pybullet

Gym-microrts (MultiDiscrete)

Atari with Envpool

Procgen

Reproduction of all of our results

Citation

About

Releases

Packages

Languages

License

GaoZiHong/ppo-implementation-details

Folders and files

Latest commit

History

Repository files navigation

The 37 Implementation Details of Proximal Policy Optimization

Get started

Atari

Pybullet

Gym-microrts (MultiDiscrete)

Atari with Envpool

Procgen

Reproduction of all of our results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages