Skip to content

Latest commit

 

History

History
35 lines (20 loc) · 1.4 KB

README.md

File metadata and controls

35 lines (20 loc) · 1.4 KB

Implement Reinforcement Learning on BipedalWalker-v3

This is a simple example of how to implement reinforcement learning on BipedalWalker-v3 using stable-baselines.

You can see the complete tutorial for stable-baselines here:

https://stable-baselines.readthedocs.io/en/master/index.html

Introduction

BipedalWalker-v3 is an environment in OpenAI gym. Reward is given for moving forward, total 300+ points up to the far end. If the robot falls, it gets -100.

Stable Baselines is a set of improved implementations of Reinforcement Learning algorithms based on OpenAI Baselines. The simplicity of these tools will allow beginners to experiment with a more advanced toolset, without being buried in implementation details.

Experiment

In this example, four RL algorithms as below are implemented:

  • PPO
  • ACKTR
  • SAC
  • TD3

The hyperparameters are set as it is advised in RL Baselines Zoo.

For each algorithm, the timesteps are all two million so that we could compare them.

Get Started

The codes of this example are wrriten using Jupyter Notebook.

You can just enter the GitHub URL of each algorithm in Google Colab and run them directly.

Result

The results are shown in the TensorBorad.