This is the Scilab-RL repository focusing on goal-conditioned reinforcement learning using the stable baselines 3 methods and Gymnasium interface.
We now have a wiki with many tutorials, check it out!
The framework is tailored towards the rapid prototyping, development and evaluation of new RL algorithms and methods. It has the following unique selling-points compared to others, like spinning up and stable baselines:
- Built-in data visualization for fast and efficient debugging using MLFLow and Weights & Biases.
- Support for many state-of-the-art algorithms via stable baselines 3 and extensible to others.
- Built-in hyperparameter optimization using Optuna
- Easy development of new robotic simulation and real robot environments based on MuJoCo.
- Smoke and performance testing
- Compatibility between a multitude of state-of-the-art algorithms for quick empirical comparison and evaluation.
- A focus on goal-conditioned reinforcement learning with hindsight experience replay to avoid environment-specific reward shaping.
- Requirements
- Getting Started
- Supported Algorithms
- Hyperparameter optimization and management
- Known Issues
The framework is designed to run on Linux, best compatibility with Ubuntu 22. However, it is also reported to run on MacOS and WSL2 (see this tutorial). The preferred Python version is 3.11, but it is likely to run also with less recent versions >= v3.8. A GPU is not required, but it will speed up the training significantly.
For visualization with matplotlib, it is important to have the GUI-backend tkinter installed (see this for more information).
It is also important to install the following packages, if they are not yet there. On Ubuntu execute the following:
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf gcc ffmpeg
- run
./scripts/setup.sh
. This will automatically install the Conda Python interpreter, along with all required packages. It will also install the robotic simulator MuJoCo. - source your ~/.bashrc:
source ~/.bashrc
- activate the conda python environment:
conda activate scilabrl
- Optional but recommended: Use Weights and Biases (WandB). Create an account. Run
wandb login
in the console and paste your API key. If you don't want to use WandB, run your experiments with the command line parameterwandb=0
. - Check your installation with
python3 src/main.py n_epochs=1 wandb=0 env=FetchReach-v2
- Look at the tutorials in the wiki for more details.
You can also install all dependencies manually, but we do not recommend this.
We currently support the Stable Baselines 3 goal-conditioned off-policy algorithms: DDPG, TD3, SAC and HER. We also support PPO.
We have one-file implementations of SAC (cleansac
, optionally with HER), PPO (cleanppo
) and DQN (cleandqn
).
These are based on the Stable Baselines 3 and CleanRL implementations of the algorithms and have
comparable performance. They can be good starting points for trying out new ideas.
The framework has a sophisticated hyperparameter management and optimization pipeline, based on Hydra, Optuna, MLFlow and Weights & Biases. The tutorials in the wiki explain how to use it.
-
Mujoco may fail due to this error when debugging. If it happens with PyCharm, you can unset "Attach to subprocess automatically while debugging" in the Python Debugger Settings (File | Settings | Build, Execution, Deployment | Python Debugger) to avoid this error.
-
Pytorch may complain about a CUDA error, throwing something like this:
NVIDIA GeForce RTX 3050 Ti Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
In that case you need to install the latest nightly build according to the configuration tool on the website.