Yuxue Yang, Lue Fan†, Zhaoxiang Zhang† (†: Corresponding Authors)
[ 📑 Paper ] [ GitHub Repo ] [ 📎 BibTeX ]
A good LiDAR-based detector needs massive semantic labels for difficult semantic learning but only a few accurate labels for geometry estimation.
- MixSup is a practical and universality paradigm for label-efficient LiDAR-based 3D object detection, simultaneously utilizing cheap coarse labels and limited accurate labels.
- MixSup achieves up to 97.31% of fully supervised performance with cheap cluster-level labels and only 10% box-level labels, which has been validated in nuScenes, Waymo Open Dataset, and KITTI.
- MixSup can seamlessly integrate with various 3D detectors, such as SECOND, CenterPoint, PV-RCNN, and FSD.
- PointSAM is a simple and effective method for MixSup to automatically segment cluster-level labels, further reducing the annotation burden.
- PointSAM is on par with the recent fully supervised panoptic segmentation models for thing classes on nuScenes without any 3D annotations!
nuScenes Sample Token | 1ac0914c98b8488cb3521efeba354496 | fd8420396768425eabec9bdddf7e64b6 |
---|---|---|
PointSAM | ||
Ground Truth |
Methods | PQTh | SQTh | RQTh |
---|---|---|---|
GP-S3Net | 56.0 | 85.3 | 65.2 |
SMAC-Seg | 65.2 | 87.1 | 74.2 |
Panoptic-PolarNet | 59.2 | 84.1 | 70.3 |
SCAN | 60.6 | 85.7 | 70.2 |
PointSAM (Ours) | 63.7 | 82.6 | 76.9 |
Step 1. Create a conda environment and activate it.
conda create --name MixSup python=3.8 -y
conda activate MixSup
Step 2. Install PyTorch following official instructions. The codes are tested on PyTorch 1.9.1, CUDA 11.1.
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
Step 3. Install Segment Anything and torch_scatter.
pip install git+https://github.com/facebookresearch/segment-anything.git
pip install https://data.pyg.org/whl/torch-1.9.0%2Bcu111/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl
Step 4. Install other dependencies.
pip install -r requirements.txt
Download nuScenes Full dataset and nuScenes-panoptic (for evaluation) from the official website, then extract and organize the data ito the following structure:
PointSAM-for-MixSup
└── data
└── nuscenes
├── maps
├── panoptic
├── samples
├── sweeps
└── v1.0-trainval
Note: v1.0-trainval/category.json
and v1.0-trainval/panoptic.json
in nuScenes-panoptic will replace the original v1.0-trainval/category.json
and v1.0-trainval/panoptic.json
of the Full dataset.
First download the model checkpoints, then run the following commands to reproduce the results in the paper:
# single-gpu
bash run.sh
# multi-gpu
bash run_dist.sh
Note:
- The default setting for
run_dist.sh
is to use 8 GPUs. If you want to use less GPUs, please modify theNUM_GPUS
argument inrun_dist.sh
. - You can specify the
SAMPLE_INDICES
betweenscripts/indices_train.npy
andscripts/indices_val.npy
to run PointSAM on train or val split of nuScenes. The default setting is to segment the val split and evaluate the results on panoptic segmentation task. - Before running the scripts, please make sure that you have at least 850MB of free space in the
OUT_DIR
folder for val split and 4GB for train split. segment3D.py
is the main script for PointSAM. The argument--for_eval
is used to generate labels with the same format as nuScenes-panoptic for evaluation, which is not necessary for MixSup. If you just want to utilize PointSAM for MixSup, please remove--for_eval
inrun.sh
orrun_dist.sh
. We also provide a script to convert the labels generated by PointSAM between the.npz
format for nuScenes-panoptic evaluation and.bin
format for MixSup.
We adopt ViT-H SAM as the segmentation model for PointSAM and utilize nuImages pre-trained HTC to integrate semantics for instance masks.
Click the following links to download the model checkpoints and put them in the ckpt/
folder to be consistent with the configuration in configs/cfg_PointSAM.py
.
ViT-H SAM
: ViT-H SAM modelHTC
: HTC model
- Publish the code about PointSAM.
- OpenPCDet based MixSup.
- MMDetection3D based MixSup.
Please consider citing our work as follows if it is helpful.
@inproceedings{yang2024mixsup,
title={MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection},
author={Yang, Yuxue and Fan, Lue and Zhang, Zhaoxiang},
booktitle={ICLR},
year={2024},
}
This project is based on the following repositories.