Work in progress.
-
Deep reinforcement learning (RL) implementations using TF's probabilistic library, specifically focusing on agents using recurrent neural networks (RNNs).
-
Compatible with Keras' Functional API.
-
Although possibly subject to change in the future, currently implemented and optimized for a non-distributed setup (i.e., for a single CPU and/or GPU setup).
A quick benchmark of the recurrent PPO algorithm in the Atari environments (using a single processor + GPU, and 32 parallel environments), shows that it processes, roughly, 6-12M frames per hour — approximately 1700-3300 frames per second (FPS).
-
Python RL environments (e.g., Gym(nasium) enviroments such as Classic Control and Atari environments) can be run on the TF graph, allowing the complete interaction loop (agent-environment interaction) to run non-eagerly. See Driver.
-
Hybrid action spaces.
-
A PPO algorithm that deals with partial observability is implemented (RecurrentPPOAgent). RecurrentPPOAgent makes use of stateful RNNs to pass hidden states between time steps, allowing the agent to make decisions based on past states as well as the current state (Figure B). This contrasts to a typical PPO implementations wherein the agent makes decisions based on the current state only (Figure A).
The use of hidden states is a clever way to pass experiences through time. One limitation of this approach however, is that the hidden states correspond to incomplete trajectories (chunks of trajectories) for each training iteration — a limitation especially emphasized for longer episodes and off-policy RL (using experience replay). For further reading, see R2D2 paper.
- Agents
- Layers
- DenseNormal - for continuous actions.
- DenseCategorical - for categorical actions.
- DenseBernoulli - for binary actions.
- StatefulRNN - for passing information between states
- Distributions
- BoundedNormal - a bounded normal distribution, inheriting from
TransformedDistribution
.
- BoundedNormal - a bounded normal distribution, inheriting from
- Environments
- Environment - Abstract environment that wraps gym environment's
reset
andstep
intf.numpy_function
and converts its output to a Timestep. - AsyncEnvironment - allowing multiple independent environments to run in parallel. Inherits from Environment.
- Environment - Abstract environment that wraps gym environment's
For hybrid action spaces, just combine action layers:
from keras import Model
from reinforceable import layers
# ...
action_1 = layers.DenseNormal((2,), [-1., 1.])(x) # continuous action, dim=2
action_2 = layers.DenseCategorical((10,))(x) # discrete action, n=10
policy_network = Model(inputs, (action_1, action_2))
# ...
- Python (3.10)
- tensorflow (2.13.0)
- tensorflow-probability (0.20.1)
- gymnasium[all] (0.26.2)
For atari environments, atari ROMs need to be installed. See here.
With SSH:
git clone [email protected]:akensert/reinforceable.git
pip install -e .
With HTTPS:
git clone https://github.com/akensert/reinforceable.git
pip install -e .