You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
New Features
added Twin Delayed DDPG (TD3) algorithm, with HER support
added support for continuous action spaces to action_probability, computing the
PDF of a Gaussian policy in addition to the existing support for categorical stochastic policies.
added flag to action_probability to return log-probabilities.
added support for python lists and numpy arrays in logger.writekvs. (@dwiel)
the info dict returned by VecEnvs now include a terminal_observation key providing access to the last observation in a trajectory. (@qxcv)
Bug Fixes
fixed a bug in traj_segment_generator where the episode_starts was wrongly recorded, resulting in wrong calculation of Generalized Advantage Estimation (GAE), this affects TRPO, PPO1 and GAIL (thanks to @miguelrass for spotting the bug)
added missing property n_batch in BasePolicy.
Others
renamed some keys in traj_segment_generator to be more meaningful
retrieve unnormalized reward when using Monitor wrapper with TRPO, PPO1 and GAIL to display them in the logs (mean episode reward)
clean up DDPG code (renamed variables)
Documentation
doc fix for the hyperparameter tuning command in the rl zoo
added an example on how to log additional variable with tensorboard and a callback