This repository contains the source code for the paper Learning Dexterous Grasping with Object-Centric Visual Affordances. Accepted at International Conference on Robotics and Automation (ICRA), 2021
If you find this work useful in your research, please consider citing:
@inproceedings{mandikal2021graff,
title = {Learning Dexterous Grasping with Object-Centric Visual Affordances},
author = {Mandikal, Priyanka and Grauman, Kristen},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
year = {2021}
}
To visualize the provided trained model, set up the environment correctly (see below) and run:
bash scripts/demo.sh
This will save videos of robot trajectories in expts/graff_trained/videos_stability
. You can change the object as needed.
GRAFF is a deep RL dexterous robotic grasping policy that uses visual affordance priors learnt from humans for functional grasping. Our proposed model, called GRAFF for Grasp-Affordances, consists of two stages. First, we train a network to predict affordance regions from static images. Second, we train a dynamic grasping policy using the learned affordances. The key upshots of our approach are better grasping, faster learning, and generalization to successfully grasp objects unseen during policy training.
- Download MuJoCo v2.0 binaries from the official website: https://www.roboti.us/download.html.
- Download the activation key from here: https://www.roboti.us/license.html
- Unzip the downloaded mujoco200 directory into ~/.mujoco/mujoco200, and copy the license key (mjkey.txt) to ~/.mujoco/mjkey.txt. Note that unzip of the MuJoCo binaries will generate mujoco200_linux. You need to rename the directory and place it at ~/.mujoco/mujoco200.
- You may need to update bashrc by adding the following lines and source it:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path/to/.mujoco>/mujoco200/bin/
Clone the baselines repo in another folder outside graff/
. Baselines repo: https://github.com/priyankamandikal/baselines_graff
git clone [email protected]:priyankamandikal/baselines_graff.git
- Create a conda environment called 'graff':
conda create -n graff python=3.10
conda activate graff
- Install packages:
conda install -c menpo glew
conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia
conda install tensorflow
conda install -c conda-forge matplotlib
conda install -c conda-forge moviepy
conda install -c conda-forge opencv
conda install -c conda-forge seaborn
conda install -c conda-forge quaternion
pip install open3d trimesh numba pyquaternion nvidia-ml-py3
conda install -c anaconda scikit-image
Note: It is important to install opencv after moviepy so that you don’t run into ffmpeg issues
- Install mujocopy
conda install patchelf
pip install mujoco_py==2.0.2.13
If you face cython errors during mujocopy installation, you might want to try the following:
pip install "cython<3"
pip install lockfile glfw
pip install mujoco_py==2.0.2.5
If using GPU, after successful installation, do this:
Open file <path-to-conda-env>/graff/lib/python3.10/site-packages/mujoco_py/builder.py
.
In function load_cython_ext()
, change line 74:
From:
else:
builder = LinuxCPUExtensionBuilder
To:
else:
builder = LinuxGPUExtensionBuilder
- Install baselines:
cd <path-to-baselines-repo>/baselines
pip install -e .
Make sure that you are able to import the following packages in python without any errors
import torch
import mujoco_py
import baselines
We train and validate our grasping policy on 16 objects from the ContactDB dataset. We port the object meshes into Mujoco and convexify them using VHACD. The convexified meshes are present in envs/resources/meshes/contactdb/vhacd
.
- Render ContactDB meshes to generate images for training and testing the affordance model
python affordance-pred/render_contactdb_data.py --obj apple cup cell_phone door_knob flashlight hammer knife light_bulb mouse mug pan scissors stapler teapot toothbrush toothpaste
- Train the visual affordance model on ContactDB
python affordance-pred/train.py
- To train the graff model, run:
bash scripts/train/graff.sh
Note that this code trains on the 3D affordance points directly extracted from ContactDB. To use the visual affordances, render the predicted 2d affordance map into 3D using the depth map, save the 20 maximally diverse keypoints on the object and train the model using the predicted points.
- To train the no prior baseline, run:
bash scripts/train/noprior.sh
- To train the center of mass baseline, run:
bash scripts/train/com.sh
Follow the steps detailed above to set up the environment and download the pre-trained models.
- Evaluate the affordance model on rendered ContactDB objects
python affordance-pred/evaluate_contactdb.py
- For evaluating the trained grasping policy, run:
bash scripts/eval.sh
The results (metrics and videos) will be saved inside expts/
Below are a few sample results from our visual affordance model
Below are a few sample results from our dexterous grasping policy.
The PPO algorithm has been adapted from https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail