Reproducible results, automatic `VecEnv` wrapping, env checker and more usability improvements
Breaking Changes:
- The
seed
argument has been moved fromlearn()
method to model constructor
in order to have reproducible results allow_early_resets
of theMonitor
wrapper now default toTrue
make_atari_env
now returns aDummyVecEnv
by default (instead of aSubprocVecEnv
)
this usually improves performance.- Fix inconsistency of sample type, so that mode/sample function returns tensor of tf.int64 in CategoricalProbabilityDistribution/MultiCategoricalProbabilityDistribution (@seheevic)
New Features:
-
Add
n_cpu_tf_sess
to model constructor to choose the number of threads used by Tensorflow -
Environments are automatically wrapped in a
DummyVecEnv
if needed when passing them to the model constructor -
Added
stable_baselines.common.make_vec_env
helper to simplify VecEnv creation -
Added
stable_baselines.common.evaluation.evaluate_policy
helper to simplify model evaluation -
VecNormalize
changes:- Now supports being pickled and unpickled (@AdamGleave).
- New methods
.normalize_obs(obs)
andnormalize_reward(rews)
apply normalization
to arbitrary observation or rewards without updating statistics (@shwang) .get_original_reward()
returns the unnormalized rewards from the most recent timestep.reset()
now collects observation statistics (used to only apply normalization)
-
Add parameter
exploration_initial_eps
to DQN. (@jdossgollin) -
Add type checking and PEP 561 compliance.
Note: most functions are still not annotated, this will be a gradual process. -
DDPG, TD3 and SAC accept non-symmetric action spaces. (@Antymon)
-
Add
check_env
util to check if a custom environment follows the gym interface (@araffin and @justinkterry)
Bug Fixes:
- Fix seeding, so it is now possible to have deterministic results on cpu
- Fix a bug in DDPG where
predict
method withdeterministic=False
would fail - Fix a bug in TRPO: mean_losses was not initialized causing the logger to crash when there was no gradients (@MarvineGothic)
- Fix a bug in
cmd_util
from API change in recent Gym versions - Fix a bug in DDPG, TD3 and SAC where warmup and random exploration actions would end up scaled in the replay buffer (@Antymon)
Deprecations:
nprocs
(ACKTR) andnum_procs
(ACER) are deprecated in favor ofn_cpu_tf_sess
which is now common
to all algorithmsVecNormalize
:load_running_average
andsave_running_average
are deprecated in favour of using pickle.
Others:
- Add upper bound for Tensorflow version (<2.0.0).
- Refactored test to remove duplicated code
- Add pull request template
- Replaced redundant code in load_results (@jbulow)
- Minor PEP8 fixes in dqn.py (@justinkterry)
- Add a message to the assert in
PPO2
- Update replay buffer doctring
- Fix
VecEnv
docstrings
Documentation:
- Add plotting to the Monitor example (@rusu24edward)
- Add Snake Game AI project (@pedrohbtp)
- Add note on the support Tensorflow versions.
- Remove unnecessary steps required for Windows installation.
- Remove
DummyVecEnv
creation when not needed - Added
make_vec_env
to the examples to simplify VecEnv creation - Add QuaRL project (@srivatsankrishnan)
- Add Pwnagotchi project (@evilsocket)
- Fix multiprocessing example (@rusu24edward)
- Fix
result_plotter
example - Add JNRR19 tutorial (by @edbeeching, @hill-a and @araffin)
- Updated notebooks link
- Fix typo in algos.rst, "containes" to "contains" (@SyllogismRXS)
- Fix outdated source documentation for load_results
- Add PPO_CPP project (@Antymon)
- Add section on C++ portability of Tensorflow models (@Antymon)
- Update custom env documentation to reflect new gym API for the
close()
method (@justinkterry) - Update custom env documentation to clarify what step and reset return (@justinkterry)
- Add RL tips and tricks for doing RL experiments
- Corrected lots of typos
- Add spell check to documentation if available