Reinforment Learning with Stable Baselines 3

Abstract

Stable Baselines is an open-source Python library that contains a set of implementations for reinforment learning which can be run in the gyms offered by OpenAI. This code is an experiment to see how the PPO (Proximal Policy Optimization), A2C (Advantage Actor Critic) and DQN (Deep Q Learning) algorithms perform against each other within the CartPole gym. The goal of this gym is to balance the pole ontop of the cart for as long as possible.

Prerequisites

Python 3.8.0

Dependencies

Run the following command to install the dependencies required for the code;

pip install pipenv
pipenv install

Training the Models

Run the following command in order to train the PPO, A2C, and DQN models;

pipenv run train <opt:epochs> where epochs >= 1, epochs <= 100 (optional argument, default: 25). Note: epochs is value specified * 10,000.

Viewing the Models Performance

Run the following command to initialize TensorBoard, which will allow you to compare and contrast the models. You can do this by viewing the 'ep_rew_mean' graph which shows the average episode reward mean value, which indicates how well the model is performing in the gym.

pipenv run metrics

Then visit: http://localhost:6006/ to view the data.

Loading Simulations of the Models

Run the following command in order to train the PPO, A2C or DQN models;

pipenv run load <algorithm> <modelFileName>.zip where algorithm is PPO, A2C, or DQN and modelFileName must exist relative to 'models/algorithm/' directory.

Results

The graph below shows the average reward value per episode for each of the models over 1,000,000 epochs.

PPO (orange)
A2C (blue)
DQN (red)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
load_model.py		load_model.py
train_models.py		train_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforment Learning with Stable Baselines 3

Abstract

Prerequisites

Dependencies

Training the Models

Viewing the Models Performance

Loading Simulations of the Models

Results

PPO - 10,000 epochs into training

PPO - 100,000 epochs into training

Resources

About

Releases

Packages

Languages

GrahlmanMatthew/CartPole-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Reinforment Learning with Stable Baselines 3

Abstract

Prerequisites

Dependencies

Training the Models

Viewing the Models Performance

Loading Simulations of the Models

Results

PPO - 10,000 epochs into training

PPO - 100,000 epochs into training

Resources

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages