This is the official source code for the paper "SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation" by Fabian Duffhauss et al. accepted to the IEEE Robotics and Automation Letters (RA-L) 2023. The code allows the users to reproduce and extend the results reported in the study. Please cite the paper when reporting, reproducing, or extending the results.
-
Install CUDA 10.1
-
Create a virtual environment with all required packages:
conda create -n SyMFM6D python=3.6 conda activate SyMFM6D conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch conda install -c anaconda -c conda-forge scikit-learn conda install pytorch-lightning==1.4.9 torchmetrics==0.6.0 -c conda-forge conda install matplotlib einops tensorboardx pandas opencv==3.4.2 -c conda-forge pip install opencv-contrib-python==3.4.2.16
-
Compile the RandLA-Net operators:
cd models/RandLA/ sh compile_op.sh
-
Download and install normalSpeed:
git clone https://github.com/hfutcgncas/normalSpeed.git cd normalSpeed/normalSpeed python3 setup.py install
- The YCB-Video dataset can be downloaded here.
- The MV-YCB SymMovCam dataset can be downloaded here. Using this dataset requires the 3D models of the YCB-Video dataset which can be downloaded here.
After downloading a new dataset, the zip file needs to be extracted and the paths to the 3D models and the datasets need to be modified in common.py. For training a new model from scratch, a ResNet-34 pre-trained on ImageNet is required which can be downloaded here.
Our symmetry-aware training procedure requires the rotational symmetry axes of all objects. We compute them once in advance of the training by running the script find_symmetries.py for each object. For objects with multiple symmetry axes, the script needs to be ran multiple times with different initial values for the symmetry axis, e.g.
python utils/find_symmetries.py --obj_name 024_bowl --symmtype rotational
We also provide pre-computed symmetry axes for all objects which we used for the paper in symmetries.txt.
In the following, we give a few examples how to train models with our SyMFM6D approach on the different datasets.
python run.py --dataset ycb --workers 4 --run_name YcbVideo_1view_training --epochs 40 --batch_size 9 \
--sift_fps_kps 1 --symmetry 1 --n_rot_sym 16
python run.py --dataset ycb --workers 4 --run_name YcbVideo_3views_training --epochs 10 --batch_size 3 \
--sift_fps_kps 1 --symmetry 1 --n_rot_sym 16 --multi_view 1 --set_views 3 --checkpoint single_view_checkpoint.ckpt \
--lr_scheduler reduce --lr 7e-05
python run.py --dataset SymMovCam --workers 4 --run_name MvYcbSymMovCam_1view_training --epochs 60 \
--batch_size 3 --sift_fps_kps 1 --symmetry 1 --n_rot_sym 16
python run.py --dataset SymMovCam --workers 4 --run_name MvYcbSymMovCam_1view_training --epochs 60 \
--batch_size 3 --sift_fps_kps 1 --symmetry 1 --n_rot_sym 16 --multi_view 1 --set_views 3
A model can be evaluated by specifying a checkpoint using --checkpoint <name_of_checkpoint>
and by adding the argument --test
, e.g.
python run.py --dataset ycb --workers 4 --run_name YcbVideo_3views_evaluation --batch_size 3 --sift_fps_kps 1 \
--multi_view 1 --set_views 3 --checkpoint YcbVideo_3views_checkpoint.ckpt
We provide the runtime of our SyMFM6D approach for different number of views in the following table:
Number of Views | Network Forward Time | Pose Estimation Time | Total Time |
---|---|---|---|
1 | 46 ms | 14 ms | 60 ms |
2 | 92 ms | 19 ms | 111 ms |
3 | 138 ms | 25 ms | 163 ms |
4 | 184 ms | 30 ms | 214 ms |
5 | 230 ms | 36 ms | 266 ms |
The network forward time includes the 3D keypoint offset prediction, the center point offset prediction, and the prediction of semantic labels. The pose estimation time represents the time for applying the mean shift clustering algorithm and the least-squares fitting for computing the 6D pose of a single object. Please note that the usage of the symmetry-aware loss does increase the training time slightly, but it does not affect the runtime. The runtimes are measured using a single GPU of type NVIDIA Tesla V100 with 32 GB of memory.
In the following, we show the tensor shapes of our network architecture:
Layer | CNN Tensor Shape | PCN Tensor Shape |
---|---|---|
1 | H/4, W/4, 64 | N, 8 |
2 | H/4, W/4, 64 | N/4, 64 |
3 | H/8, W/8, 128 | N/16, 128 |
4 | H/8, W/8, 512 | N/64, 256 |
5 | H/8, W/8, 1024 | N/256, 512 |
6 | H/4, W/4, 256 | N/64, 256 |
7 | H/2, W/2, 64 | N/16, 128 |
8 | H/2, W/2, 64 | N/4, 64 |
The multi-layer perceptrons (MLP) of our multi-directional fusion modules have the following channel sizes, where ci and cp denote the channel sizes of the image and point features of the respective layer:
MLP | Channel Sizes |
---|---|
MLPi | ci, cp |
MLPfi | 2cp, cp |
MLPp | cp, ci |
MLPfp | 2ci, ci |
The channel sizes of the MLPs in the second stage of our network in Fig. 2 of our paper are as followed:
MLP | Channel Sizes |
---|---|
Keypoint Detection | ci + cp, 128, 128, 128, ncls |
Center Point Detection | ci + cp, 128, 128, 128, 3nkps |
Semantic Segmentation | ci + cp, 128, 128, 128, 3 |
For the output of the network architecture we set ci = cp = 64. ncls is the number of classes in the used dataset, e.g. ncls = 22 for YCB-Video and SymMovCam. nkps is the number of keypoints per object, which we set to eight.
The code of SyMFM6D is open-sourced under the AGPL-3.0 license. See the LICENSE file for details. The MV-YCB SymMovCam dataset is open-sourced under the CC-BY-SA-4.0 license.
For a list of other open source components included in SyMFM6D, see the file 3rd-party-licenses.txt.
If you use this work please cite:
@article{Duffhauss_2023,
author = {Duffhauss, Fabian and Koch, Sebastian and Ziesche, Hanna and Vien, Ngo Anh and Neumann, Gerhard},
title = {SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation},
year = {2023},
journal = {IEEE Robotics and Automation Letters (RA-L)}
}