Skip to content

Releases: Eclectic-Sheep/sheeprl

v0.4.7

23 Nov 17:03
f56a767
Compare
Choose a tag to compare

v0.4.7 Release Notes

  • SheepRL is now on PyPI: every time a release is published, the new version of SheepRL is published also in PyPI (#155)
  • Torchmetrics is no longer installed from the github main branch (#155).
  • Moviepy is no longer installed from the github main branch (#155).
  • box2d-py is not a mandatory dependency anymore, it is possible to install gymnasium[box2d] with the pip install sheeprl[box2d] command (#156)
  • The moviepy.decorators.use_clip_fps_by_default function is replaced (in the ./sheeprl/__init__.py file) with the method in the moviepy main branch (#156).

v0.4.6

22 Nov 14:57
Compare
Choose a tag to compare

v0.4.6 Release Notes

  • The exploration amount of the Dreamer's player has been moved to the Actor in #150
  • All the P2E scripts have been split into exploration and finetuning in #151
  • The hydra version has been fixed to 1.3 in #152
  • SheepRL is now published on PyPi in #155

v0.4.5post0

09 Nov 10:05
Compare
Choose a tag to compare

v0.4.5post0 Release Notes

  • Fixes MineDojo and Dreamer's player in #148

v0.4.5

07 Nov 11:54
Compare
Choose a tag to compare

v0.4.5 Release Notes

  • Added new how-to explaining how to add a new custom environment in #128
  • Added the possibility to completely disable logging metrics and decide what and how to log metrics in every algorithm in #129
  • Fixed the models creation of the Dreamer-V3 agent, where we have removed the bias from every linear layer followed by a LayerNorm and an activation function
  • Added the possibility for the users to specify their own custom configs, possibly inheriting from the already defined sheeprl configs in #132
  • Added the support to Lightning 2.1 in #136
  • Added the possibility to evaluate every agent given a checkpoint in #139 #141
  • Various minor fixes in #125 #133 #134 #135 #137 #140 #143 #144 #145 #146

v0.4.4

10 Oct 12:40
Compare
Choose a tag to compare

v0.4.4 Release Notes

  • Fixes the activation in the recurrent model in DV1 in #110
  • Updated the Diambra wrapper to support the new Diambra package in #111
  • Added dotdict to speedup accessing the loaded config in #112
  • Better naming when hydra creates the output dirs in #114
  • Added the validate_args to decide whether torch.distributions must check the arguments to the __init__ function in; disable it to have a huge speedup in #116
  • Updated Diambra wrapper to support AsynVectorEnv #119
  • Minor fixes #120

v0.4.3

28 Sep 09:11
2bd4eee
Compare
Choose a tag to compare

v0.4.3 Release Notes

In this release we have:

  • Fixed the action reset given the done flag in the recurrent PPO implementation
  • Updated the documentation

v0.4.2

27 Sep 15:39
Compare
Choose a tag to compare

v0.4.2 Release Notes

In this release we have:

  • refactored the recurrent PPO implementation. In particular:
    • A single LSTM model is used, taking in input the current observation, the previous played action and the previous recurrent state, i.e., LSTM([o_t, a_t-1], h_t-1). The LSTM has an optional pre-mlp an post-mlp: those can be controlled in the relative algo/ppo_recurrent.yaml config
    • A feature extractor is used to extract features from the observations, being them vectors or images
  • Every PPO algorithm now computes the bootstrapped value, summing it to the current reward, whenever an environment has been truncated

v0.4.0

20 Sep 12:24
Compare
Choose a tag to compare

v0.4.0 Release Notes

In this release we have:

  • made the whole framework single-entryed, i.e. now one can run an experiment just with python sheeprl.py exp=... env=..., removing the need to prepend lightning run model ... sheeprl.py everytime. The Fabric-related configs can be found and changed under the sheeprl/configs/fabric/ folder. (#97)
  • uniformed the make_env vs make_dict_env method, so there is no more distinction between the two. We now assume that the environment has an observation space that is a gymnasium.spaces.Dict, if it is not an exception is raised. (#96)
  • implemented the resume_from_checkpoint for every algorithm. (#95)
  • added the Crafter environment. (#103)
  • Fixed some environments, in particula Diambra and DMC
    • Diambra: renamed the wrapper implementation file; the done flag now checks if info["env_done"] flag is True. (#98)
    • DMC: removed a env.frame_skip=0 for mujoco envs and removed the action repeat from the DMC wrapper. (#99)

v0.3.2

15 Sep 08:46
717c517
Compare
Choose a tag to compare

v0.3.2 Release Notes

In this release we have fixed the logging time of every algorithm. In particular:

  • The Time/sps_env_interaction measures the steps-per-second of the environment interaction of the agent, namely the forward to obtain the new action given the observation and the execution of step method of the environment. This value is local to the rank-0 and takes into consideration the action_repeat that one set through hydra/cli
  • The Time/sps_train measures the steps-per-second of the train function that runs in a distributed manner, considering all the ranks calling the train function

v0.3.1

11 Sep 09:45
b62cce6
Compare
Choose a tag to compare

v0.3.1 Release Notes

In this release we have refactored some names inside every algorithm, in particular:

  • we have introduced the concept of policy_step, which is the number of (distributed) policy steps per environment step, where the environment step does not take into consideration the action repeat, i.e. is the number of times the policy is called to collect an action given an observation. If one has n ranks and m environments per rank, then the number of policy steps per environment step is policy_steps = n * m

We have also refactored the hydra configs, in particular:

  • we have introduced both the metric, checkpoint and buffer config, containing the shared hyperparameters for those objects in every algorithm
  • the metric config has the metric.log_every parameter, which controls the logging frequency. Since it's hard for the policy_step variable to be divisible by the metric.log_every value, the logging will happen as soon as policy_step - last_log >= cfg.metric.log_every, with last_log = policy_step is updated everytime something is logged
  • the checkpoint has the every and resume_from parameters. The every parameter works as the metric.log_every one, while the resume_from specifies the experiment folder, which must contain the .hydra folder, to resume the training from. This is now only supported by the Dreamer algorithms
  • num_envs and clip_reward have been moved to the env config