Bug Fixes and Improvements (VecEnv)
Warning: breaking change when using custom policies
- doc update (fix example of result plotter + improve doc)
- fixed logger issues when stdout lacks
read
function - fixed a bug in
common.dataset.Dataset
where shuffling was not disabled properly (it affects only PPO1 with recurrent policies) - fixed output layer name for DDPG q function, used in pop-art normalization and l2 regularization of the critic
- added support for multi env recording to
generate_expert_traj
(@XMaster96) - added support for LSTM model recording to
generate_expert_traj
(@XMaster96) GAIL
: remove mandatory matplotlib dependency and refactor as subclass ofTRPO
(@kantneel and @AdamGleave)- added
get_attr()
,env_method()
andset_attr()
methods for all VecEnv.
Those methods now all acceptindices
keyword to select a subset of envs.
set_attr
now returnsNone
rather than a list ofNone
. (@kantneel) GAIL
:gail.dataset.ExpertDataset
supports loading from memory rather than file, and
gail.dataset.record_expert
supports returning in-memory rather than saving to file.- added support in
VecEnvWrapper
for accessing attributes of arbitrarily deeply nested
instances ofVecEnvWrapper
andVecEnv
. This is allowed as long as the attribute belongs
to exactly one of the nested instances i.e. it must be unambiguous. (@kantneel) - fixed bug where result plotter would crash on very short runs (@Pastafarianist)
- added option to not trim output of result plotter by number of timesteps (@Pastafarianist)
- clarified the public interface of
BasePolicy
andActorCriticPolicy
. Breaking change when using custom policies:masks_ph
is now calleddones_ph
. - support for custom stateful policies.
- fixed episode length recording in
trpo_mpi.utils.traj_segment_generator
(@GerardMaggiolino)