Skip to content

Latest commit

 

History

History
67 lines (53 loc) · 3.07 KB

README.md

File metadata and controls

67 lines (53 loc) · 3.07 KB

Gazebo Gym Environment for learning force control with peg insertion tasks (Universal Robots)

This repository is based on our public repositories ur3 and ur_openai_gym.

Installation

Tested on ROS Melodic - Ubuntu 18.04

Recommended installation using Docker: follow wiki instructions

Usage: Peg-In-Hole tasks

Training

On one terminal, launch a gazebo environment for an specific peg:

roslaunch ur3_gazebo ur_peg.launch peg_shape:=cuboid

valid shapes [cylinder, hexagon, cuboid, triangular, trapezoid, star]

On a different terminal, execute the training script:

rosrun ur3e_rl tf2rl_sac.py -e0

The training should start:

  • The robot goes to a given initial position,
  • a taskboard spawned randomly within a defined workspace,
  • the robot moves towards the goal pose, defined with respect to the current pose of the taskboard
  • the training lasts 100,000 time steps.

There are 3 sample environments defined in the training script tf2rl_sac.py

  1. For a training environment with a single peg, which does not change during the entire training session.
  2. For a training environment with multiple pegs, the entire robot is reset when the peg is changed. **Note** in this case the simulation tends to be unstable without a powerful CPU/GPU. The `real time factor` start degrading with time until it barely moves.
  3. A sample configuration for training/testing with a real robot.

The hyper-parameters of each environment is defined in a YAML config file, which can be found on the packageur3e_rl/config

After a training session is started, training data will be store in a folder results/[datetime]_**dir_suffix** under the current directory. More information can be found in the RL repository tf2rl

Testing

A learned policy can be executed using the helper script evaluate_policy.py:

rosrun ur3e_rl evaluate_policy.py -n 50 results/[datetime]_**dir_suffix**

Useful parameters of this script

  • --training: actions are taken similarly as during training, with a randomness to it, i.e. a sample from a Gaussian distribution. Without this, the actions are taken as the mean of the Gaussian distribution, a lot more stable behavior.
  • -n: number of test to execute
  • -s: store/record observations from tests.

References

If you find this code useful, consider citing our work:

@article{
  author = {Beltran-Hernandez, Cristian C. and Petit, Damien and Ramirez-Alpizar, Ixchel G. and Harada, Kensuke},
  title = {Accelerating Robot Learning of Contact-Rich Manipulations: A Curriculum Learning Study},
  doi = {10.48550/ARXIV.2204.12844},
  url = {https://arxiv.org/abs/2204.12844},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution Non Commercial No Derivatives 4.0 International}
}

ArXiv: https://arxiv.org/abs/2204.12844

Supplemental Video: https://youtu.be/_FVQC5OcGjs