Merge branch 'main' of https://github.com/Eclectic-Sheep/sheeprl into…

… feature/resume_from_checkpoint
Eclectic-Sheep · Sep 18, 2023 · 4ffb986 · 4ffb986
2 parents 321c955 + 733faf1
commit 4ffb986
Show file tree

Hide file tree

Showing 58 changed files with 549 additions and 631 deletions.
diff --git a/.github/workflows/cpu-tests.yaml b/.github/workflows/cpu-tests.yaml
@@ -41,7 +41,7 @@ jobs:
       - name: Install packages
         run: |
           python -m pip install -U pip
-          python -m pip install .[atari,test,dev]
+          python -m pip install -e .[atari,test,dev]
 
       - name: Run tests
         run: |

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1,4 @@
+recursive-include sheeprl *.py
+recursive-include sheeprl *.yaml
+global-exclude *.pyc
+global-exclude __pycache__
diff --git a/README.md b/README.md
@@ -155,13 +155,13 @@ Now you can use one of the already available algorithms, or create your own.
 For example, to train a PPO agent on the CartPole environment with only vector-like observations, just run
 
 ```bash
-python sheeprl.py ppo exp=ppo env=gym env.id=CartPole-v1
+python sheeprl.py exp=ppo env=gym env.id=CartPole-v1
 ```
 
 You check all the available algorithms with
 
 ```bash
-python sheeprl.py --sheeprl_help
+python sheeprl/available_agents.py
 ```
 
 That's all it takes to train an agent with SheepRL! 🎉
@@ -194,17 +194,17 @@ What you run is the PPO algorithm with the default configuration. But you can al
 For example, in the default configuration, the number of parallel environments is 4. Let's try to change it to 8 by passing the `--num_envs` argument:
 
 ```bash
-python sheeprl.py ppo exp=ppo env=gym env.id=CartPole-v1 num_envs=8
+python sheeprl.py exp=ppo env=gym env.id=CartPole-v1 env.num_envs=8
 ```
 
 All the available arguments, with their descriptions, are listed in the `sheeprl/config` directory. You can find more information about the hierarchy of configs [here](./howto/run_experiments.md).
 
 ### Running with Lightning Fabric
 
-To run the algorithm with Lightning Fabric, you need to call Lightning with its parameters. For example, to run the PPO algorithm with 4 parallel environments on 2 nodes, you can run:
+To run the algorithm with Lightning Fabric, you need to specify the Fabric parameters through the CLI. For example, to run the PPO algorithm with 4 parallel environments on 2 nodes, you can run:
 
 ```bash
-lightning run model --accelerator=cpu --strategy=ddp --devices=2 sheeprl.py ppo exp=ppo env=gym env.id=CartPole-v1
+python sheeprl.py fabric.accelerator=cpu fabric.strategy=ddp fabric.devices=2 exp=ppo env=gym env.id=CartPole-v1
 ```
 
 You can check the available parameters for Lightning Fabric [here](https://lightning.ai/docs/fabric/stable/api/fabric_args.html).

diff --git a/howto/configs.md b/howto/configs.md
@@ -22,9 +22,11 @@ sheeprl/configs
 │   ├── droq.yaml
 │   ├── p2e_dv1.yaml
 │   ├── p2e_dv2.yaml
+│   ├── ppo_decoupled.yaml
 │   ├── ppo_recurrent.yaml
 │   ├── ppo.yaml
 │   ├── sac_ae.yaml
+│   ├── sac_decoupled.yaml
 │   └── sac.yaml
 ├── buffer
 │   └── default.yaml
@@ -47,17 +49,26 @@ sheeprl/configs
 │   ├── dreamer_v1.yaml
 │   ├── dreamer_v2_ms_pacman.yaml
 │   ├── dreamer_v2.yaml
+│   ├── dreamer_v3_100k_boxing.yaml
 │   ├── dreamer_v3_100k_ms_pacman.yaml
+│   ├── dreamer_v3_dmc_walker_walk.yaml
+│   ├── dreamer_v3_L_doapp_128px_gray_combo_discrete.yaml
 │   ├── dreamer_v3_L_doapp.yaml
 │   ├── dreamer_v3_L_navigate.yaml
 │   ├── dreamer_v3.yaml
 │   ├── droq.yaml
 │   ├── p2e_dv1.yaml
 │   ├── p2e_dv2.yaml
+│   ├── ppo_decoupled.yaml
 │   ├── ppo_recurrent.yaml
 │   ├── ppo.yaml
 │   ├── sac_ae.yaml
+│   ├── sac_decoupled.yaml
 │   └── sac.yaml
+├── fabric
+│   ├── ddp-cpu.yaml
+│   ├── ddp-cuda.yaml
+│   └── default.yaml
 ├── hydra
 │   └── default.yaml
 ├── __init__.py
@@ -86,9 +97,10 @@ defaults:
   - buffer: default.yaml
   - checkpoint: default.yaml
   - env: default.yaml
-  - exp: null
-  - hydra: default.yaml
+  - fabric: default.yaml
   - metric: default.yaml
+  - hydra: default.yaml
+  - exp: ???
 
 num_threads: 1
 total_steps: ???
@@ -112,10 +124,6 @@ cnn_keys:
 mlp_keys:
   encoder: []
   decoder: ${mlp_keys.encoder}
-
-# Buffer
-buffer:
-  memmap: True
 ```
 
 ### Algorithms
@@ -305,7 +313,50 @@ The environment configs can be found under the `sheeprl/configs/env` folders. Sh
 * [MineRL (v0.4.4)](https://minerl.readthedocs.io/en/v0.4.4/)
 * [MineDojo (v0.1.0)](https://docs.minedojo.org/)
 
-In this way one can easily try out the overall framework with standard RL environments.
+In this way one can easily try out the overall framework with standard RL environments. The `default.yaml` config contains all the environment parameters shared by (possibly) all the environments:
+
+```yaml
+id: ???
+num_envs: 4
+frame_stack: 1
+sync_env: False
+screen_size: 64
+action_repeat: 1
+grayscale: False
+clip_rewards: False
+capture_video: True
+frame_stack_dilation: 1
+max_episode_steps: null
+reward_as_observation: False
+```
+
+Every custom environment must then "inherit" from this default config, override the particular parameters and define the the `wrapper` field, which is the one that will be directly instantiated at runtime. The `wrapper` field must define all the specific parameters to be passed to the `_target_` function when the wrapper will be instantiated. Take for example the `atari.yaml` config:
+
+```yaml
+defaults:
+  - default
+  - _self_
+
+# Override from `default` config
+action_repeat: 4
+id: PongNoFrameskip-v4
+max_episode_steps: 27000
+
+# Wrapper to be instantiated
+wrapper:
+  _target_: gymnasium.wrappers.AtariPreprocessing  # https://gymnasium.farama.org/api/wrappers/misc_wrappers/#gymnasium.wrappers.AtariPreprocessing
+  env:
+    _target_: gymnasium.make
+    id: ${env.id}
+    render_mode: rgb_array
+  noop_max: 30
+  terminal_on_life_loss: False
+  frame_skip: ${env.action_repeat}
+  screen_size: ${env.screen_size}
+  grayscale_obs: ${env.grayscale}
+  scale_obs: False
+  grayscale_newaxis: True
+```
 
 > **Warning**
 >
@@ -362,9 +413,13 @@ algo:
 Given this config, one can easily run an experiment to test the Dreamer-V3 algorithm on the Ms-PacMan environment with the following simple CLI command: 
 
 ```bash
-lightning run model sheeprl.py dreamer_v3 exp=dreamer_v3_100k_ms_pacman
+python sheeprl.py exp=dreamer_v3_100k_ms_pacman
 ```
 
+### Fabric
+
+These configurations control the parameters to be passed to the [Fabric object](https://lightning.ai/docs/fabric/stable/api/generated/lightning.fabric.fabric.Fabric.html#lightning.fabric.fabric.Fabric). With those one can control whether to run the experiments on multiple devices, on which accelerator and with thich precision. For more information please have a look to the [Lightning documentation page](https://lightning.ai/docs/fabric/stable/api/fabric_args.html#).
+
 ### Hydra
 
 These configuration file manages where and how to create folders or subfolders for experiments. For more information please visit the [hydra documentation](https://hydra.cc/docs/configure_hydra/intro/). Our default hydra config is the following:
@@ -387,7 +442,6 @@ log_every: 5000
 sync_on_compute: False
 ```
 
-
 ### Optimizer
 
 Each optimizer file defines how we initialize the training optimizer with their parameters. For a better understanding of PyTorch optimizers, one should have a look at it at [https://pytorch.org/docs/stable/optim.html](https://pytorch.org/docs/stable/optim.html). An example config is the following:

diff --git a/howto/learn_in_atari.md b/howto/learn_in_atari.md
@@ -8,29 +8,6 @@ pip install .[atari]
 For more information: https://gymnasium.farama.org/environments/atari/ 
 
 ## Train your agent
-First you need to select which agent you want to train. The list of the trainable agent can be retrieved as follows:
-
-```bash
-Usage: sheeprl.py [OPTIONS] COMMAND [ARGS]...
-
-  SheepRL zero-code command line utility.
-
-Options:
-  --sheeprl_help  Show this message and exit.
-
-Commands:
-  dreamer_v1
-  dreamer_v2
-  droq
-  p2e_dv1
-  p2e_dv2
-  ppo
-  ppo_decoupled
-  ppo_recurrent
-  sac
-  sac_ae
-  sac_decoupled
-```
 
 It is important to remind that not all the algorithms can work with images, so it is necessary to check the first table in the [README](../README.md) and select a proper algorithm.
 The list of selectable algorithms is given below:
@@ -46,5 +23,5 @@ The list of selectable algorithms is given below:
 Once you have chosen the algorithm you want to train, you can start the train, for instance, of the ppo agent by running:
 
 ```bash
-lightning run model --accelerator=cpu --strategy=ddp --devices=2 sheeprl.py ppo exp=ppo env=atari env.id=PongNoFrameskip-v4 cnn_keys.encoder=[rgb]
+python sheeprl.py exp=ppo env=atari env.id=PongNoFrameskip-v4 cnn_keys.encoder=[rgb] fabric.accelerator=cpu fabric.strategy=ddp fabric.devices=2
 ```
diff --git a/howto/learn_in_diambra.md b/howto/learn_in_diambra.md
@@ -54,14 +54,14 @@ The observation space is slightly modified to be compatible with our algorithms,
 ## Multi-environments / Distributed training
 In order to train your agent with multiple environments or to perform a distributed training, you have to specify to the `diambra run` command the number of environments you want to instantiate  (through the `-s` cli argument). So, you have to multiply the number of environments per single process and the number of processes you want to launch (the number of *player* processes for decoupled algorithms). Thus, in case of coupled algorithm (e.g., `dreamer_v2`), if you want distribute your training among $2$ processes each one containing $4$ environments, the total number of environments will be: $2 \cdot 4 = 8$. The command will be:
 ```bash
-diambra run -s=8 lightning run model --devices=2 sheeprl.py dreamer_v3 exp=dreamer_v3 env=diambra env.id=doapp num_envs=4 env.sync_env=True cnn_keys.encoder=[frame]
+diambra run -s=8 python sheeprl.py exp=dreamer_v3 env=diambra env.id=doapp env.num_envs=4 env.sync_env=True cnn_keys.encoder=[frame] fabric.devices=2
 ```
 
 ## Args
 The IDs of the DIAMBRA environments are specified [here](https://docs.diambra.ai/envs/games/). To train your agent on a DIAMBRA environment you have to select the diambra configs with the argument `env=diambra`, then set the `env.id` argument to the environment ID, e.g., to train your agent on the *Dead Or Alive ++* game, you have to set the `env.id` argument to `doapp` (i.e., `env.id=doapp`).
 
 ```bash
-diambra run -s=4 lightning run model sheeprl.py dreamer_v3 exp=dreamer_v3 env=diambra env.id=doapp num_envs=4
+diambra run -s=4 python sheeprl.py exp=dreamer_v3 env=diambra env.id=doapp env.num_envs=4
 ```
 
 Another possibility is to create a new config file in the `sheeprl/configs/exp` folder, where you specify all the configs you want to use in your experiment. An example of custom configuration file is available [here](../sheeprl/configs/exp/dreamer_v3_L_doapp.yaml).
@@ -94,7 +94,7 @@ env:
 
 Now, to run your experiment, you have to execute the following command:
 ```bash
-diambra run -s=4 lightning run model sheeprl.py dreamer_v3 exp=custom_exp num_envs=4
+diambra run -s=4 python sheeprl.py exp=custom_exp env.num_envs=4
 ```
 
 > **Note**
@@ -118,5 +118,5 @@ diambra run -s=4 lightning run model sheeprl.py dreamer_v3 exp=custom_exp num_en
 ## Headless machines
 
 If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
-1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run diambra run lightning run model --devices=1 sheeprl.py dreamer_v3 exp=dreamer_v3 env=diambra env.id=doapp env.sync_env=True num_envs=1 cnn_keys.encoder=[frame]`
+1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run diambra run python sheeprl.py exp=dreamer_v3 env=diambra env.id=doapp env.sync_env=True env.num_envs=1 cnn_keys.encoder=[frame] fabric.devices=1`
 2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.
diff --git a/howto/learn_in_dmc.md b/howto/learn_in_dmc.md
@@ -19,12 +19,12 @@ For more information: [https://github.com/deepmind/dm_control](https://github.co
 In order to train your agents on the [MuJoCo environments](https://gymnasium.farama.org/environments/mujoco/) provided by Gymnasium, it is sufficient to select the *GYM* environment (`env=gym`) and set the `env.id` to the name of the environment you want to use. For instance, `"Walker2d-v4"` if you want to train your agent on the *walker walk* environment.
 
 ```bash
-lightning run model sheeprl.py dreamer_v3 exp=dreamer_v3 env=gym env.id=Walker2d-v4 cnn_keys.encoder=[rgb]
+python sheeprl.py exp=dreamer_v3 env=gym env.id=Walker2d-v4 cnn_keys.encoder=[rgb]
 ```
 
 ## DeepMind Control
 In order to train your agents on the [DeepMind control suite](https://github.com/deepmind/dm_control/blob/main/dm_control/suite/README.md), you have to select the *DMC* environment (`env=dmc`) and to set the id of the environment you want to use. A list of the available environments can be found [here](https://arxiv.org/abs/1801.00690). For instance, if you want to train your agent on the *walker walk* environment, you need to set the `env.id` to `"walker_walk"`.
 
 ```bash
-lightning run model sheeprl.py dreamer_v3 exp=dreamer_v3 env=dmc env.id=walker_walk cnn_keys.encoder=[rgb]
+python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk cnn_keys.encoder=[rgb]
 ```
diff --git a/howto/learn_in_minedojo.md b/howto/learn_in_minedojo.md
@@ -29,7 +29,7 @@ It is possible to train your agents on all the tasks provided by MineDojo. You n
 For instance, you can use the following command to select the MineDojo open-ended environment.
 
 ```bash
-lightning run model sheeprl.py p2e_dv2 exp=p2e_dv2 env=minedojo env.id=open-ened algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor cnn_keys.encoder=[rgb]
+python sheeprl.py exp=p2e_dv2 env=minedojo env.id=open-ened algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor cnn_keys.encoder=[rgb]
 ```
 
 ### Observation Space
@@ -67,5 +67,5 @@ For more information about the MineDojo action space, check [here](https://docs.
 ## Headless machines
 
 If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
-1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py p2e_dv2 exp=p2e_dv2 env=minedojo env.id=open-ended cnn_keys.encoder=[rgb] algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor`, or `MINEDOJO_HEADLESS=1 lightning run model --devices=1 sheeprl.py p2e_dv2 exp=p2e_dv2 env=minedojo env.id=open-ended cnn_keys.encoder=[rgb] algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor`.
+1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run python sheeprl.py exp=p2e_dv2 fabric.devices=1 env=minedojo env.id=open-ended cnn_keys.encoder=[rgb] algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor`, or `MINEDOJO_HEADLESS=1 python sheeprl.py exp=p2e_dv2 fabric.devices=1 env=minedojo env.id=open-ended cnn_keys.encoder=[rgb] algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor`.
 2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.
diff --git a/howto/learn_in_minerl.md b/howto/learn_in_minerl.md
@@ -54,5 +54,5 @@ Finally we added sticky action for the `jump` and `attack` actions. You can set
 ## Headless machines
 
 If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
-1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py dreamer_v3 exp=dreamer_v3 env=minerl env.id=custom_navigate cnn_keys.encoder=[rgb]`.
+1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run python sheeprl.py exp=dreamer_v3 fabric.devices=1 env=minerl env.id=custom_navigate cnn_keys.encoder=[rgb]`.
 2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.