Skip to content

Commit

Permalink
Release 2.7.0 (#426)
Browse files Browse the repository at this point in the history
* Release 2.7.0

* Update changelog.rst

* Update tensorboard.rst
  • Loading branch information
araffin authored Jul 31, 2019
1 parent cfde47e commit 8ceda3b
Show file tree
Hide file tree
Showing 5 changed files with 53 additions and 12 deletions.
5 changes: 5 additions & 0 deletions docs/guide/algos.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,8 @@ Actions ``gym.spaces``:
- ``MultiBinary``: A list of possible actions, where each timestep any of the actions can be used in any combination.

.. _MPI: https://mpi4py.readthedocs.io/en/stable/

.. note::

Some logging values (like `ep_rewmean`, `eplenmean`) are only available when using a Monitor wrapper
See `Issue #339 <https://github.com/hill-a/stable-baselines/issues/339>`_ for more info.
36 changes: 36 additions & 0 deletions docs/guide/tensorboard.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,42 @@ It will display information such as the model graph, the episode reward, the mod
:width: 400
:alt: graph


Logging More Values
-------------------

Using a callback, you can easily log more values with TensorBoard.
Here is a simple example on how to log both additional tensor or arbitrary scalar value:

.. code-block:: python
import tensorflow as tf
import numpy as np
from stable_baselines import SAC
model = SAC("MlpPolicy", "Pendulum-v0", tensorboard_log="/tmp/sac/", verbose=1)
# Define a new property to avoid global variable
model.is_tb_set = False
def callback(locals_, globals_):
self_ = locals_['self']
# Log additional tensor
if not self_.is_tb_set:
with self_.graph.as_default():
tf.summary.scalar('value_target', tf.reduce_mean(self_.value_target))
self_.summary = tf.summary.merge_all()
self_.is_tb_set = True
# Log scalar value (here a random variable)
value = np.random.random()
summary = tf.Summary(value=[tf.Summary.Value(tag='random_value', simple_value=value)])
locals_['writer'].add_summary(summary, self_.num_timesteps)
return True
model.learn(50000, callback=callback)
Legacy Integration
-------------------

Expand Down
18 changes: 9 additions & 9 deletions docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,29 +6,28 @@ Changelog
For download links, please look at `Github release page <https://github.com/hill-a/stable-baselines/releases>`_.


Pre-Release 2.7.0a0 (WIP)
Release 2.7.0 (2019-07-31)
--------------------------

**Twin Delayed DDPG (TD3)**
**Twin Delayed DDPG (TD3) and GAE bug fix (TRPO, PPO1, GAIL)**

Breaking Changes:
^^^^^^^^^^^^^^^^^

New Features:
^^^^^^^^^^^^^
- added Twin Delayed DDPG (TD3) algorithm, with HER support

- Add support for continuous action spaces to `action_probability`, computing the PDF of a Gaussian
- added support for continuous action spaces to `action_probability`, computing the PDF of a Gaussian
policy in addition to the existing support for categorical stochastic policies.
- Add flag to `action_probability` to return log-probabilities.
- Added support for python lists and numpy arrays in ``logger.writekvs``. (@dwiel)
- The info dicts returned by VecEnvs now include a ``terminal_observation`` key providing access to the last observation in a trajectory. (@qxcv)
- added flag to `action_probability` to return log-probabilities.
- added support for python lists and numpy arrays in ``logger.writekvs``. (@dwiel)
- the info dict returned by VecEnvs now include a ``terminal_observation`` key providing access to the last observation in a trajectory. (@qxcv)

Bug Fixes:
^^^^^^^^^^
- fixed a bug in ``traj_segment_generator`` where the ``episode_starts`` was wrongly recorded,
resulting in wrong calculation of Generalized Advantage Estimation (GAE), this affects TRPO, PPO1 and GAIL (thanks to @miguelrass for spotting the bug)
- add missing property `n_batch` in `BasePolicy`.
- added missing property `n_batch` in `BasePolicy`.

Deprecations:
^^^^^^^^^^^^^
Expand All @@ -38,12 +37,13 @@ Others:
- renamed some keys in ``traj_segment_generator`` to be more meaningful
- retrieve unnormalized reward when using Monitor wrapper with TRPO, PPO1 and GAIL
to display them in the logs (mean episode reward)
- Clean up DDPG code (renamed variables)
- clean up DDPG code (renamed variables)

Documentation:
^^^^^^^^^^^^^^

- doc fix for the hyperparameter tuning command in the rl zoo
- added an example on how to log additional variable with tensorboard and a callback



Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@
] + tf_dependency,
extras_require={
'tests': [
'pytest==3.5.1',
'pytest',
'pytest-cov',
'pytest-env',
'pytest-xdist',
Expand All @@ -138,7 +138,7 @@
license="MIT",
long_description=long_description,
long_description_content_type='text/markdown',
version="2.7.0a0",
version="2.7.0",
)

# python setup.py sdist
Expand Down
2 changes: 1 addition & 1 deletion stable_baselines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@
from stable_baselines.trpo_mpi import TRPO
from stable_baselines.sac import SAC

__version__ = "2.6.1a0"
__version__ = "2.7.0"

0 comments on commit 8ceda3b

Please sign in to comment.