This is a simple example of how to implement reinforcement learning on BipedalWalker-v3 using stable-baselines.
You can see the complete tutorial for stable-baselines here:
https://stable-baselines.readthedocs.io/en/master/index.html
BipedalWalker-v3 is an environment in OpenAI gym. Reward is given for moving forward, total 300+ points up to the far end. If the robot falls, it gets -100.
Stable Baselines is a set of improved implementations of Reinforcement Learning algorithms based on OpenAI Baselines. The simplicity of these tools will allow beginners to experiment with a more advanced toolset, without being buried in implementation details.
In this example, four RL algorithms as below are implemented:
- PPO
- ACKTR
- SAC
- TD3
The hyperparameters are set as it is advised in RL Baselines Zoo.
For each algorithm, the timesteps are all two million so that we could compare them.
The codes of this example are wrriten using Jupyter Notebook.
You can just enter the GitHub URL of each algorithm in Google Colab and run them directly.
The results are shown in the TensorBorad.