SaLinA Continual Reinforcement Learning framework with multiple scenarios and methods already implemented. It is also the codebase of the paper Building a subspace of Policies for scalable Continual Learning.
Make sure you have pytorch installed with cuda>11.0 and followed instructions to install SaLinA. In addition:
- additional packages:
pip install wandb ternary hydra
- In order to install brax, we recommend to run the following installations:
- jax+jaxlib:
pip install --upgrade "jax[cuda]==0.3.25" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
- flax:
pip install flax==0.3.4
- brax:
pip install brax
- /!\ we recommend you to use cuda/11.4 version (check it with
nvcc --version
).
- jax+jaxlib:
Simply run the file run.py
with the desired config available in configs. You can select one of them with the flag -cn=my_config
. Different scenarios are available in configs/scenario. Simply add scenario=my_scenario
as an argument. For example if you want to run the CSP method on the forgetting scenario of halfcheetah:
python run.py -cn=csp scenario=halfcheetah/forgetting
The core.py
file contains the building blocks of this framework. Each experiment consists in running a Framework
over a Scenario
, i.e. a sequence of train and test Task
. The models are learning procedures that use salina agents to interact with the tasks and learn from them through one or multiple algorithms.
- frameworks contains generic learning procedures (e.g. using only one algorithm, or adding a regularization method in the end)
- scenarios contains CRL scenarios i.e sequence of train and test tasks
- algorithms contains different RL / CL algorithms (ppo, sac, td3, ewc regularization)
- agents contains salina agents (policy, critic, ...)
- configs contains the configs files of benchmarked methods/scenarios.
We implmented 8 different methods all built on top of soft-actor critic algorithm. To try them, just add the flag -cn=my_method
on the command line. You can find the hps in configs:
-
csp
: Continual Subspace of Policies from Building a subspace of Policies for scalable Continual Learning -
ft_1
: Fine-tune a single policy during the whole training -
sac_n
: Fine-tune and save the policy at the end of the task. Start wit a randomized policy when encountering a new task. -
ft_n
: Fine-tune and save the policy at the end of the task. Clone the last policy when encountering a new task. -
ft_l2
: Fine-tune a single policy during the whole training with a regularization cost (a simpler EWC method) -
ewc
: see the paper Overcoming catastrophic forgetting in neural -
packnet
: see the paper PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning -
pnn
: see the paper Progressive Neural Networks
We propose 9 scenarios over 3 different brax domains. To try them, just add the flag scenario=...
on the command line:
-
Halfcheetah:
halfcheetah/forgetting
: 8 tasks - 1M samples for each taskhalfcheetah/transfer
: 8 tasks - 1M samples for each taskhalfcheetah/distraction
: 8 tasks - 1M samples for each taskhalfcheetah/composability
: 8 tasks - 1M samples for each task
-
Ant:
ant/forgetting
: 8 tasks - 1M samples for each taskant/transfer
: 8 tasks - 1M samples for each taskant/distraction
: 8 tasks - 1M samples for each taskant/composability
: 8 tasks - 1M samples for each task
-
Humanoid:
humanoid/hard
: 4 tasks - 2M samples for each task
Please use this bibtex if you want to cite this repository in your publications:
@misc{https://doi.org/10.48550/arxiv.2211.10445,
doi = {10.48550/ARXIV.2211.10445},
url = {https://arxiv.org/abs/2211.10445},
author = {Gaya, Jean-Baptiste and Doan, Thang and Caccia, Lucas and Soulier, Laure and Denoyer, Ludovic and Raileanu, Roberta},
keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Building a Subspace of Policies for Scalable Continual Learning},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}