Webpage: https://hil-serl.github.io/
HIL-SERL provides a set of libraries, env wrappers, and examples to train RL policies using a combination of demonstrations and human corrections to perform robotic manipulation tasks with near-perfect success rates. The following sections describe how to use HIL-SERL. We will illustrate the usage with examples.
Table of Contents
-
Setup Conda Environment: create an environment with
conda create -n hilserl python=3.10
-
Install Jax as follows:
-
For CPU (not recommended):
pip install --upgrade "jax[cpu]"
-
For GPU:
pip install --upgrade "jax[cuda12_pip]==0.4.35" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
-
For TPU
pip install --upgrade "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
-
See the Jax Github page for more details on installing Jax.
-
-
Install the serl_launcher
cd serl_launcher pip install -e . pip install -r requirements.txt
-
Install for serl_robot_infra Follow the README in
serl_robot_infra
for installation and basic robot operation instructions. This contains the instruction for installing the impendence-based serl_franka_controllers. After the installation, you should be able to run the robot server, interact with the gymfranka_env
(hardware).
HIL-SERL provides a set of common libraries for users to train RL policies for robotic manipulation tasks. The main structure of running the RL experiments involves having an actor node and a learner node, both of which interact with the robot gym environment. Both nodes run asynchronously, with data being sent from the actor to the learner node via the network using agentlace. The learner will periodically synchronize the policy with the actor. This design provides flexibility for parallel training and inference.
Table for code structure
Code Directory | Description |
---|---|
examples | Scripts for policy training, demonstration data collection, reward classifier training |
serl_launcher | Main code for HIL-SERL |
serl_launcher.agents | Agent Policies (e.g. SAC, BC) |
serl_launcher.wrappers | Gym env wrappers |
serl_launcher.data | Replay buffer and data store |
serl_launcher.vision | Vision related models and utils |
serl_robot_infra | Robot infra for running with real robots |
serl_robot_infra.robot_servers | Flask server for sending commands to robot via ROS |
serl_robot_infra.franka_env | Gym env for Franka robot |
We provide a step-by-step guide to run RL policies with HIL-SERL on a Franka robot.
Check out the Run with Franka Arm
If you use this code for your research, please cite our paper:
@misc{luo2024hilserl,
title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
author={Jianlan Luo and Charles Xu and Jeffrey Wu and Sergey Levine},
year={2024},
eprint={2410.21845},
archivePrefix={arXiv},
primaryClass={cs.RO}
}