Releases: Eclectic-Sheep/sheeprl
Releases · Eclectic-Sheep/sheeprl
v0.4.7
v0.4.7 Release Notes
- SheepRL is now on PyPI: every time a release is published, the new version of SheepRL is published also in PyPI (#155)
- Torchmetrics is no longer installed from the github main branch (#155).
- Moviepy is no longer installed from the github main branch (#155).
- box2d-py is not a mandatory dependency anymore, it is possible to install
gymnasium[box2d]
with thepip install sheeprl[box2d]
command (#156) - The
moviepy.decorators.use_clip_fps_by_default
function is replaced (in the./sheeprl/__init__.py
file) with the method in the moviepy main branch (#156).
v0.4.6
v0.4.5post0
v0.4.5post0 Release Notes
- Fixes MineDojo and Dreamer's player in #148
v0.4.5
v0.4.5 Release Notes
- Added new how-to explaining how to add a new custom environment in #128
- Added the possibility to completely disable logging metrics and decide what and how to log metrics in every algorithm in #129
- Fixed the models creation of the Dreamer-V3 agent, where we have removed the bias from every linear layer followed by a LayerNorm and an activation function
- Added the possibility for the users to specify their own custom configs, possibly inheriting from the already defined sheeprl configs in #132
- Added the support to Lightning 2.1 in #136
- Added the possibility to evaluate every agent given a checkpoint in #139 #141
- Various minor fixes in #125 #133 #134 #135 #137 #140 #143 #144 #145 #146
v0.4.4
v0.4.4 Release Notes
- Fixes the activation in the recurrent model in DV1 in #110
- Updated the Diambra wrapper to support the new Diambra package in #111
- Added
dotdict
to speedup accessing the loaded config in #112 - Better naming when hydra creates the output dirs in #114
- Added the
validate_args
to decide whethertorch.distributions
must check the arguments to the__init__
function in; disable it to have a huge speedup in #116 - Updated Diambra wrapper to support
AsynVectorEnv
#119 - Minor fixes #120
v0.4.3
v0.4.2
v0.4.2 Release Notes
In this release we have:
- refactored the recurrent PPO implementation. In particular:
- A single LSTM model is used, taking in input the current observation, the previous played action and the previous recurrent state, i.e.,
LSTM([o_t, a_t-1], h_t-1)
. The LSTM has an optional pre-mlp an post-mlp: those can be controlled in the relativealgo/ppo_recurrent.yaml
config - A feature extractor is used to extract features from the observations, being them vectors or images
- A single LSTM model is used, taking in input the current observation, the previous played action and the previous recurrent state, i.e.,
- Every PPO algorithm now computes the bootstrapped value, summing it to the current reward, whenever an environment has been truncated
v0.4.0
v0.4.0 Release Notes
In this release we have:
- made the whole framework single-entryed, i.e. now one can run an experiment just with
python sheeprl.py exp=... env=...
, removing the need to prependlightning run model ... sheeprl.py
everytime. The Fabric-related configs can be found and changed under thesheeprl/configs/fabric/
folder. (#97) - uniformed the
make_env
vsmake_dict_env
method, so there is no more distinction between the two. We now assume that the environment has an observation space that is agymnasium.spaces.Dict
, if it is not an exception is raised. (#96) - implemented the
resume_from_checkpoint
for every algorithm. (#95) - added the Crafter environment. (#103)
- Fixed some environments, in particula Diambra and DMC
v0.3.2
v0.3.2 Release Notes
In this release we have fixed the logging time of every algorithm. In particular:
- The
Time/sps_env_interaction
measures the steps-per-second of the environment interaction of the agent, namely the forward to obtain the new action given the observation and the execution ofstep
method of the environment. This value is local to the rank-0 and takes into consideration theaction_repeat
that one set through hydra/cli - The
Time/sps_train
measures the steps-per-second of the train function that runs in a distributed manner, considering all the ranks calling the train function
v0.3.1
v0.3.1 Release Notes
In this release we have refactored some names inside every algorithm, in particular:
- we have introduced the concept of
policy_step
, which is the number of (distributed) policy steps per environment step, where the environment step does not take into consideration the action repeat, i.e. is the number of times the policy is called to collect an action given an observation. If one hasn
ranks andm
environments per rank, then the number of policy steps per environment step ispolicy_steps = n * m
We have also refactored the hydra configs, in particular:
- we have introduced both the
metric
,checkpoint
andbuffer
config, containing the shared hyperparameters for those objects in every algorithm - the
metric
config has themetric.log_every
parameter, which controls the logging frequency. Since it's hard for thepolicy_step
variable to be divisible by themetric.log_every
value, the logging will happen as soon aspolicy_step - last_log >= cfg.metric.log_every
, withlast_log = policy_step
is updated everytime something is logged - the
checkpoint
has theevery
andresume_from
parameters. Theevery
parameter works as themetric.log_every
one, while theresume_from
specifies the experiment folder, which must contain the.hydra
folder, to resume the training from. This is now only supported by the Dreamer algorithms num_envs
andclip_reward
have been moved to theenv
config