This repo contains code for the RL experiments of Model Agnostic Meta-Learning. Make sure you install requirements and get your Mujoco License to run the experiments here. Additionally, there is an implementation of PPO as the meta-optimizer instead of TRPO as used by the authors. This work is done as part of the RL Course Project (Monsoon 2020) Project Report.
python --env-name 2DNavigation-v0 --fast-lr 0.1 --maml
python --env-name HalfCheetahVel-v1 --fast-lr 0.1 --maml --meta-lr 0.1 --critic_weight 0.005 --eps_clip 0.2
This script is used for testing our meta-trained policies and plots the avg returns vs number of gradient steps taken for adaptation at test time.
- : Used for plotting avg returns vs number of iterations. Use this after downloading testing curves from tensorboard in JSON format.
- : Used for visualizing (Mujoco) the performance of trained policies for the HalfCheetah Environment. Saves a video of the visualization.
- This code is an extension of the repo by Luisa M Zintgraf.