Releases · Eclectic-Sheep/sheeprl

23 Nov 17:03

michele-milesi

v0.4.7

f56a767

v0.4.7

v0.4.7 Release Notes

SheepRL is now on PyPI: every time a release is published, the new version of SheepRL is published also in PyPI (#155)
Torchmetrics is no longer installed from the github main branch (#155).
Moviepy is no longer installed from the github main branch (#155).
box2d-py is not a mandatory dependency anymore, it is possible to install gymnasium[box2d] with the pip install sheeprl[box2d] command (#156)
The moviepy.decorators.use_clip_fps_by_default function is replaced (in the ./sheeprl/__init__.py file) with the method in the moviepy main branch (#156).

Assets 2

22 Nov 14:57

belerico

0.4.6

482bc90

v0.4.6

v0.4.6 Release Notes

The exploration amount of the Dreamer's player has been moved to the Actor in #150
All the P2E scripts have been split into exploration and finetuning in #151
The hydra version has been fixed to 1.3 in #152
SheepRL is now published on PyPi in #155

Assets 2

09 Nov 10:05

belerico

v0.4.5post0

80cdea2

v0.4.5post0

v0.4.5post0 Release Notes

Fixes MineDojo and Dreamer's player in #148

Assets 2

07 Nov 11:54

belerico

v0.4.5

7631ddf

v0.4.5

v0.4.5 Release Notes

Added new how-to explaining how to add a new custom environment in #128
Added the possibility to completely disable logging metrics and decide what and how to log metrics in every algorithm in #129
Fixed the models creation of the Dreamer-V3 agent, where we have removed the bias from every linear layer followed by a LayerNorm and an activation function
Added the possibility for the users to specify their own custom configs, possibly inheriting from the already defined sheeprl configs in #132
Added the support to Lightning 2.1 in #136
Added the possibility to evaluate every agent given a checkpoint in #139 #141
Various minor fixes in #125 #133 #134 #135 #137 #140 #143 #144 #145 #146

Assets 2

10 Oct 12:40

belerico

v0.4.4

a64297d

v0.4.4

v0.4.4 Release Notes

Fixes the activation in the recurrent model in DV1 in #110
Updated the Diambra wrapper to support the new Diambra package in #111
Added dotdict to speedup accessing the loaded config in #112
Better naming when hydra creates the output dirs in #114
Added the validate_args to decide whether torch.distributions must check the arguments to the __init__ function in; disable it to have a huge speedup in #116
Updated Diambra wrapper to support AsynVectorEnv #119
Minor fixes #120

Assets 2

28 Sep 09:11

belerico

v0.4.3

2bd4eee

v0.4.3

v0.4.3 Release Notes

In this release we have:

Fixed the action reset given the done flag in the recurrent PPO implementation
Updated the documentation

Assets 2

27 Sep 15:39

belerico

v0.4.2

6e622b7

v0.4.2

v0.4.2 Release Notes

In this release we have:

refactored the recurrent PPO implementation. In particular:
- A single LSTM model is used, taking in input the current observation, the previous played action and the previous recurrent state, i.e., LSTM([o_t, a_t-1], h_t-1). The LSTM has an optional pre-mlp an post-mlp: those can be controlled in the relative algo/ppo_recurrent.yaml config
- A feature extractor is used to extract features from the observations, being them vectors or images
Every PPO algorithm now computes the bootstrapped value, summing it to the current reward, whenever an environment has been truncated

Assets 2

20 Sep 12:24

belerico

v0.4.0

9bf84dd

v0.4.0

v0.4.0 Release Notes

In this release we have:

made the whole framework single-entryed, i.e. now one can run an experiment just with python sheeprl.py exp=... env=..., removing the need to prepend lightning run model ... sheeprl.py everytime. The Fabric-related configs can be found and changed under the sheeprl/configs/fabric/ folder. (#97)
uniformed the make_env vs make_dict_env method, so there is no more distinction between the two. We now assume that the environment has an observation space that is a gymnasium.spaces.Dict, if it is not an exception is raised. (#96)
implemented the resume_from_checkpoint for every algorithm. (#95)
added the Crafter environment. (#103)
Fixed some environments, in particula Diambra and DMC
- Diambra: renamed the wrapper implementation file; the done flag now checks if info["env_done"] flag is True. (#98)
- DMC: removed a env.frame_skip=0 for mujoco envs and removed the action repeat from the DMC wrapper. (#99)

Assets 2

15 Sep 08:46

belerico

v0.3.2

717c517

v0.3.2

v0.3.2 Release Notes

In this release we have fixed the logging time of every algorithm. In particular:

The Time/sps_env_interaction measures the steps-per-second of the environment interaction of the agent, namely the forward to obtain the new action given the observation and the execution of step method of the environment. This value is local to the rank-0 and takes into consideration the action_repeat that one set through hydra/cli
The Time/sps_train measures the steps-per-second of the train function that runs in a distributed manner, considering all the ranks calling the train function

Assets 2

11 Sep 09:45

belerico

v0.3.1

b62cce6

v0.3.1

v0.3.1 Release Notes

In this release we have refactored some names inside every algorithm, in particular:

we have introduced the concept of policy_step, which is the number of (distributed) policy steps per environment step, where the environment step does not take into consideration the action repeat, i.e. is the number of times the policy is called to collect an action given an observation. If one has n ranks and m environments per rank, then the number of policy steps per environment step is policy_steps = n * m

We have also refactored the hydra configs, in particular:

we have introduced both the metric, checkpoint and buffer config, containing the shared hyperparameters for those objects in every algorithm
the metric config has the metric.log_every parameter, which controls the logging frequency. Since it's hard for the policy_step variable to be divisible by the metric.log_every value, the logging will happen as soon as policy_step - last_log >= cfg.metric.log_every, with last_log = policy_step is updated everytime something is logged
the checkpoint has the every and resume_from parameters. The every parameter works as the metric.log_every one, while the resume_from specifies the experiment folder, which must contain the .hydra folder, to resume the training from. This is now only supported by the Dreamer algorithms
num_envs and clip_reward have been moved to the env config

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.7 Release Notes

v0.4.6 Release Notes

v0.4.5post0 Release Notes

v0.4.5 Release Notes

v0.4.4 Release Notes

v0.4.3 Release Notes

v0.4.2 Release Notes

v0.4.0 Release Notes

v0.3.2 Release Notes

v0.3.1 Release Notes

Releases: Eclectic-Sheep/sheeprl

v0.4.7

v0.4.7 Release Notes

v0.4.6

v0.4.6 Release Notes

v0.4.5post0

v0.4.5post0 Release Notes

v0.4.5

v0.4.5 Release Notes

v0.4.4

v0.4.4 Release Notes

v0.4.3

v0.4.3 Release Notes

v0.4.2

v0.4.2 Release Notes

v0.4.0

v0.4.0 Release Notes

v0.3.2

v0.3.2 Release Notes

v0.3.1

v0.3.1 Release Notes