Mirror-Descent-Policy-Optimization

This repository contains the code for MDPO, a trust-region algorithm based on principles of Mirror Descent. It includes two variants, on-policy MDPO and off-policy MDPO, based on the paper Mirror Descent Policy Optimization.

This implementation makes use of Tensorflow and builds over the code provided by stable-baselines.

Getting Started

Prerequisites

All dependencies are provided in a python virtual-env requirements.txt file. Majorly, you would need to install stable-baselines, tensorflow, and mujoco_py.

Installation

Install stable-baselines

pip install stable-baselines[mpi]==2.7.0

Download and copy MuJoCo library and license files into a .mujoco/ directory. We use mujoco200 for this project.
Clone MDPO and copy the mdpo-on and mdpo-off directories inside this directory.
Activate virtual-env using the requirements.txt file provided.

source <virtual env path>/bin/activate

Example

Use the run_mujoco.py script for training MDPO.

On-policy MDPO

python3 run_mujoco.py --env=Walker2d-v2 --sgd_steps=10

Off-policy MDPO

python3 run_mujoco.py --env=Walker2d-v2 --num_timesteps=1e6 --sgd_steps=1000 --klcoeff=1.0 --lam=0.2 --tsallis_coeff=1.0

Reference

@article{tomar2020mirror,
  title={Mirror Descent Policy Optimization},
  author={Tomar, Manan and Shani, Lior and Efroni, Yonathan and Ghavamzadeh, Mohammad},
  journal={arXiv preprint arXiv:2005.09814},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
common		common
mdpo_off		mdpo_off
mdpo_on		mdpo_on
README.md		README.md
logger.py		logger.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mirror-Descent-Policy-Optimization

Getting Started

Prerequisites

Installation

Example

Reference

About

Releases

Packages

Languages

manantomar/Mirror-Descent-Policy-Optimization

Folders and files

Latest commit

History

Repository files navigation

Mirror-Descent-Policy-Optimization

Getting Started

Prerequisites

Installation

Example

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages