Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/Eclectic-Sheep/sheeprl into…
Browse files Browse the repository at this point in the history
… feature/resume_from_checkpoint
  • Loading branch information
belerico committed Sep 18, 2023
2 parents 321c955 + 733faf1 commit 4ffb986
Show file tree
Hide file tree
Showing 58 changed files with 549 additions and 631 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/cpu-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
- name: Install packages
run: |
python -m pip install -U pip
python -m pip install .[atari,test,dev]
python -m pip install -e .[atari,test,dev]
- name: Run tests
run: |
Expand Down
4 changes: 4 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
recursive-include sheeprl *.py
recursive-include sheeprl *.yaml
global-exclude *.pyc
global-exclude __pycache__
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,13 +155,13 @@ Now you can use one of the already available algorithms, or create your own.
For example, to train a PPO agent on the CartPole environment with only vector-like observations, just run

```bash
python sheeprl.py ppo exp=ppo env=gym env.id=CartPole-v1
python sheeprl.py exp=ppo env=gym env.id=CartPole-v1
```

You check all the available algorithms with

```bash
python sheeprl.py --sheeprl_help
python sheeprl/available_agents.py
```

That's all it takes to train an agent with SheepRL! 🎉
Expand Down Expand Up @@ -194,17 +194,17 @@ What you run is the PPO algorithm with the default configuration. But you can al
For example, in the default configuration, the number of parallel environments is 4. Let's try to change it to 8 by passing the `--num_envs` argument:

```bash
python sheeprl.py ppo exp=ppo env=gym env.id=CartPole-v1 num_envs=8
python sheeprl.py exp=ppo env=gym env.id=CartPole-v1 env.num_envs=8
```

All the available arguments, with their descriptions, are listed in the `sheeprl/config` directory. You can find more information about the hierarchy of configs [here](./howto/run_experiments.md).

### Running with Lightning Fabric

To run the algorithm with Lightning Fabric, you need to call Lightning with its parameters. For example, to run the PPO algorithm with 4 parallel environments on 2 nodes, you can run:
To run the algorithm with Lightning Fabric, you need to specify the Fabric parameters through the CLI. For example, to run the PPO algorithm with 4 parallel environments on 2 nodes, you can run:

```bash
lightning run model --accelerator=cpu --strategy=ddp --devices=2 sheeprl.py ppo exp=ppo env=gym env.id=CartPole-v1
python sheeprl.py fabric.accelerator=cpu fabric.strategy=ddp fabric.devices=2 exp=ppo env=gym env.id=CartPole-v1
```

You can check the available parameters for Lightning Fabric [here](https://lightning.ai/docs/fabric/stable/api/fabric_args.html).
Expand Down
72 changes: 63 additions & 9 deletions howto/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,11 @@ sheeprl/configs
│ ├── droq.yaml
│ ├── p2e_dv1.yaml
│ ├── p2e_dv2.yaml
│ ├── ppo_decoupled.yaml
│ ├── ppo_recurrent.yaml
│ ├── ppo.yaml
│ ├── sac_ae.yaml
│ ├── sac_decoupled.yaml
│ └── sac.yaml
├── buffer
│ └── default.yaml
Expand All @@ -47,17 +49,26 @@ sheeprl/configs
│ ├── dreamer_v1.yaml
│ ├── dreamer_v2_ms_pacman.yaml
│ ├── dreamer_v2.yaml
│ ├── dreamer_v3_100k_boxing.yaml
│ ├── dreamer_v3_100k_ms_pacman.yaml
│ ├── dreamer_v3_dmc_walker_walk.yaml
│ ├── dreamer_v3_L_doapp_128px_gray_combo_discrete.yaml
│ ├── dreamer_v3_L_doapp.yaml
│ ├── dreamer_v3_L_navigate.yaml
│ ├── dreamer_v3.yaml
│ ├── droq.yaml
│ ├── p2e_dv1.yaml
│ ├── p2e_dv2.yaml
│ ├── ppo_decoupled.yaml
│ ├── ppo_recurrent.yaml
│ ├── ppo.yaml
│ ├── sac_ae.yaml
│ ├── sac_decoupled.yaml
│ └── sac.yaml
├── fabric
│ ├── ddp-cpu.yaml
│ ├── ddp-cuda.yaml
│ └── default.yaml
├── hydra
│ └── default.yaml
├── __init__.py
Expand Down Expand Up @@ -86,9 +97,10 @@ defaults:
- buffer: default.yaml
- checkpoint: default.yaml
- env: default.yaml
- exp: null
- hydra: default.yaml
- fabric: default.yaml
- metric: default.yaml
- hydra: default.yaml
- exp: ???

num_threads: 1
total_steps: ???
Expand All @@ -112,10 +124,6 @@ cnn_keys:
mlp_keys:
encoder: []
decoder: ${mlp_keys.encoder}

# Buffer
buffer:
memmap: True
```
### Algorithms
Expand Down Expand Up @@ -305,7 +313,50 @@ The environment configs can be found under the `sheeprl/configs/env` folders. Sh
* [MineRL (v0.4.4)](https://minerl.readthedocs.io/en/v0.4.4/)
* [MineDojo (v0.1.0)](https://docs.minedojo.org/)

In this way one can easily try out the overall framework with standard RL environments.
In this way one can easily try out the overall framework with standard RL environments. The `default.yaml` config contains all the environment parameters shared by (possibly) all the environments:

```yaml
id: ???
num_envs: 4
frame_stack: 1
sync_env: False
screen_size: 64
action_repeat: 1
grayscale: False
clip_rewards: False
capture_video: True
frame_stack_dilation: 1
max_episode_steps: null
reward_as_observation: False
```

Every custom environment must then "inherit" from this default config, override the particular parameters and define the the `wrapper` field, which is the one that will be directly instantiated at runtime. The `wrapper` field must define all the specific parameters to be passed to the `_target_` function when the wrapper will be instantiated. Take for example the `atari.yaml` config:

```yaml
defaults:
- default
- _self_
# Override from `default` config
action_repeat: 4
id: PongNoFrameskip-v4
max_episode_steps: 27000

# Wrapper to be instantiated
wrapper:
_target_: gymnasium.wrappers.AtariPreprocessing # https://gymnasium.farama.org/api/wrappers/misc_wrappers/#gymnasium.wrappers.AtariPreprocessing
env:
_target_: gymnasium.make
id: ${env.id}
render_mode: rgb_array
noop_max: 30
terminal_on_life_loss: False
frame_skip: ${env.action_repeat}
screen_size: ${env.screen_size}
grayscale_obs: ${env.grayscale}
scale_obs: False
grayscale_newaxis: True
```
> **Warning**
>
Expand Down Expand Up @@ -362,9 +413,13 @@ algo:
Given this config, one can easily run an experiment to test the Dreamer-V3 algorithm on the Ms-PacMan environment with the following simple CLI command:

```bash
lightning run model sheeprl.py dreamer_v3 exp=dreamer_v3_100k_ms_pacman
python sheeprl.py exp=dreamer_v3_100k_ms_pacman
```

### Fabric

These configurations control the parameters to be passed to the [Fabric object](https://lightning.ai/docs/fabric/stable/api/generated/lightning.fabric.fabric.Fabric.html#lightning.fabric.fabric.Fabric). With those one can control whether to run the experiments on multiple devices, on which accelerator and with thich precision. For more information please have a look to the [Lightning documentation page](https://lightning.ai/docs/fabric/stable/api/fabric_args.html#).

### Hydra

These configuration file manages where and how to create folders or subfolders for experiments. For more information please visit the [hydra documentation](https://hydra.cc/docs/configure_hydra/intro/). Our default hydra config is the following:
Expand All @@ -387,7 +442,6 @@ log_every: 5000
sync_on_compute: False
```


### Optimizer

Each optimizer file defines how we initialize the training optimizer with their parameters. For a better understanding of PyTorch optimizers, one should have a look at it at [https://pytorch.org/docs/stable/optim.html](https://pytorch.org/docs/stable/optim.html). An example config is the following:
Expand Down
25 changes: 1 addition & 24 deletions howto/learn_in_atari.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,6 @@ pip install .[atari]
For more information: https://gymnasium.farama.org/environments/atari/

## Train your agent
First you need to select which agent you want to train. The list of the trainable agent can be retrieved as follows:

```bash
Usage: sheeprl.py [OPTIONS] COMMAND [ARGS]...

SheepRL zero-code command line utility.

Options:
--sheeprl_help Show this message and exit.

Commands:
dreamer_v1
dreamer_v2
droq
p2e_dv1
p2e_dv2
ppo
ppo_decoupled
ppo_recurrent
sac
sac_ae
sac_decoupled
```

It is important to remind that not all the algorithms can work with images, so it is necessary to check the first table in the [README](../README.md) and select a proper algorithm.
The list of selectable algorithms is given below:
Expand All @@ -46,5 +23,5 @@ The list of selectable algorithms is given below:
Once you have chosen the algorithm you want to train, you can start the train, for instance, of the ppo agent by running:

```bash
lightning run model --accelerator=cpu --strategy=ddp --devices=2 sheeprl.py ppo exp=ppo env=atari env.id=PongNoFrameskip-v4 cnn_keys.encoder=[rgb]
python sheeprl.py exp=ppo env=atari env.id=PongNoFrameskip-v4 cnn_keys.encoder=[rgb] fabric.accelerator=cpu fabric.strategy=ddp fabric.devices=2
```
8 changes: 4 additions & 4 deletions howto/learn_in_diambra.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,14 @@ The observation space is slightly modified to be compatible with our algorithms,
## Multi-environments / Distributed training
In order to train your agent with multiple environments or to perform a distributed training, you have to specify to the `diambra run` command the number of environments you want to instantiate (through the `-s` cli argument). So, you have to multiply the number of environments per single process and the number of processes you want to launch (the number of *player* processes for decoupled algorithms). Thus, in case of coupled algorithm (e.g., `dreamer_v2`), if you want distribute your training among $2$ processes each one containing $4$ environments, the total number of environments will be: $2 \cdot 4 = 8$. The command will be:
```bash
diambra run -s=8 lightning run model --devices=2 sheeprl.py dreamer_v3 exp=dreamer_v3 env=diambra env.id=doapp num_envs=4 env.sync_env=True cnn_keys.encoder=[frame]
diambra run -s=8 python sheeprl.py exp=dreamer_v3 env=diambra env.id=doapp env.num_envs=4 env.sync_env=True cnn_keys.encoder=[frame] fabric.devices=2
```

## Args
The IDs of the DIAMBRA environments are specified [here](https://docs.diambra.ai/envs/games/). To train your agent on a DIAMBRA environment you have to select the diambra configs with the argument `env=diambra`, then set the `env.id` argument to the environment ID, e.g., to train your agent on the *Dead Or Alive ++* game, you have to set the `env.id` argument to `doapp` (i.e., `env.id=doapp`).

```bash
diambra run -s=4 lightning run model sheeprl.py dreamer_v3 exp=dreamer_v3 env=diambra env.id=doapp num_envs=4
diambra run -s=4 python sheeprl.py exp=dreamer_v3 env=diambra env.id=doapp env.num_envs=4
```

Another possibility is to create a new config file in the `sheeprl/configs/exp` folder, where you specify all the configs you want to use in your experiment. An example of custom configuration file is available [here](../sheeprl/configs/exp/dreamer_v3_L_doapp.yaml).
Expand Down Expand Up @@ -94,7 +94,7 @@ env:

Now, to run your experiment, you have to execute the following command:
```bash
diambra run -s=4 lightning run model sheeprl.py dreamer_v3 exp=custom_exp num_envs=4
diambra run -s=4 python sheeprl.py exp=custom_exp env.num_envs=4
```

> **Note**
Expand All @@ -118,5 +118,5 @@ diambra run -s=4 lightning run model sheeprl.py dreamer_v3 exp=custom_exp num_en
## Headless machines

If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run diambra run lightning run model --devices=1 sheeprl.py dreamer_v3 exp=dreamer_v3 env=diambra env.id=doapp env.sync_env=True num_envs=1 cnn_keys.encoder=[frame]`
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run diambra run python sheeprl.py exp=dreamer_v3 env=diambra env.id=doapp env.sync_env=True env.num_envs=1 cnn_keys.encoder=[frame] fabric.devices=1`
2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.
4 changes: 2 additions & 2 deletions howto/learn_in_dmc.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ For more information: [https://github.com/deepmind/dm_control](https://github.co
In order to train your agents on the [MuJoCo environments](https://gymnasium.farama.org/environments/mujoco/) provided by Gymnasium, it is sufficient to select the *GYM* environment (`env=gym`) and set the `env.id` to the name of the environment you want to use. For instance, `"Walker2d-v4"` if you want to train your agent on the *walker walk* environment.

```bash
lightning run model sheeprl.py dreamer_v3 exp=dreamer_v3 env=gym env.id=Walker2d-v4 cnn_keys.encoder=[rgb]
python sheeprl.py exp=dreamer_v3 env=gym env.id=Walker2d-v4 cnn_keys.encoder=[rgb]
```

## DeepMind Control
In order to train your agents on the [DeepMind control suite](https://github.com/deepmind/dm_control/blob/main/dm_control/suite/README.md), you have to select the *DMC* environment (`env=dmc`) and to set the id of the environment you want to use. A list of the available environments can be found [here](https://arxiv.org/abs/1801.00690). For instance, if you want to train your agent on the *walker walk* environment, you need to set the `env.id` to `"walker_walk"`.

```bash
lightning run model sheeprl.py dreamer_v3 exp=dreamer_v3 env=dmc env.id=walker_walk cnn_keys.encoder=[rgb]
python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk cnn_keys.encoder=[rgb]
```
4 changes: 2 additions & 2 deletions howto/learn_in_minedojo.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ It is possible to train your agents on all the tasks provided by MineDojo. You n
For instance, you can use the following command to select the MineDojo open-ended environment.

```bash
lightning run model sheeprl.py p2e_dv2 exp=p2e_dv2 env=minedojo env.id=open-ened algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor cnn_keys.encoder=[rgb]
python sheeprl.py exp=p2e_dv2 env=minedojo env.id=open-ened algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor cnn_keys.encoder=[rgb]
```

### Observation Space
Expand Down Expand Up @@ -67,5 +67,5 @@ For more information about the MineDojo action space, check [here](https://docs.
## Headless machines

If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py p2e_dv2 exp=p2e_dv2 env=minedojo env.id=open-ended cnn_keys.encoder=[rgb] algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor`, or `MINEDOJO_HEADLESS=1 lightning run model --devices=1 sheeprl.py p2e_dv2 exp=p2e_dv2 env=minedojo env.id=open-ended cnn_keys.encoder=[rgb] algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor`.
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run python sheeprl.py exp=p2e_dv2 fabric.devices=1 env=minedojo env.id=open-ended cnn_keys.encoder=[rgb] algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor`, or `MINEDOJO_HEADLESS=1 python sheeprl.py exp=p2e_dv2 fabric.devices=1 env=minedojo env.id=open-ended cnn_keys.encoder=[rgb] algo.actor.cls=sheeprl.algos.p2e_dv2.agent.MinedojoActor`.
2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.
2 changes: 1 addition & 1 deletion howto/learn_in_minerl.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,5 +54,5 @@ Finally we added sticky action for the `jump` and `attack` actions. You can set
## Headless machines

If you work on a headless machine, you need to software renderer. We recommend to adopt one of the following solutions:
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run lightning run model --devices=1 sheeprl.py dreamer_v3 exp=dreamer_v3 env=minerl env.id=custom_navigate cnn_keys.encoder=[rgb]`.
1. Install the `xvfb` software with the `sudo apt install xvfb` command and prefix the train command with `xvfb-run`. For instance, to train DreamerV2 on the navigate task on an headless machine, you need to run the following command: `xvfb-run python sheeprl.py exp=dreamer_v3 fabric.devices=1 env=minerl env.id=custom_navigate cnn_keys.encoder=[rgb]`.
2. Exploit the [PyVirtualDisplay](https://github.com/ponty/PyVirtualDisplay) package.
Loading

0 comments on commit 4ffb986

Please sign in to comment.