SparseOcc v1.1 release (#37)

* add usage of ray_metrics.py * topk bug fix * add scal and lovasz loss * add focal loss * add bda flip rot * code clean * add config for 60ep * update panoptic config * Cleaning --------- Co-authored-by: YANG-CY-163 <[email protected]>
MCG-NJU · Jun 27, 2024 · fe9a966 · fe9a966
1 parent 550b12a
commit fe9a966
Show file tree

Hide file tree

Showing 17 changed files with 401 additions and 497 deletions.
diff --git a/README.md b/README.md
@@ -2,12 +2,24 @@
 
 This is the official PyTorch implementation for our paper:
 
-> [**SparseOcc: Fully Sparse 3D Occupancy Prediction**](https://arxiv.org/abs/2312.17118)<br>
+> [**Fully Sparse 3D Panoptic Occupancy Prediction**](https://arxiv.org/abs/2312.17118)<br>
 > :school: Presented by Nanjing University and Shanghai AI Lab<br>
 > :email: Primary contact: Haisong Liu ([email protected])<br>
 > :trophy: [CVPR 2024 Autonomous Driving Challenge - Occupancy and Flow](https://opendrivelab.com/challenge2024/#occupancy_and_flow)<br>
 > :book: 第三方中文解读: [自动驾驶之心](https://zhuanlan.zhihu.com/p/675811281)，[AIming](https://zhuanlan.zhihu.com/p/691549750)。谢谢你们！
 
+## :warning: Important Notes
+There is concurrent work titled ‘SparseOcc: Rethinking Sparse Latent Representation’ by Tang et al., which shares the same name SparseOcc with our work. If you cite our research, please ensure that you reference the correct version (arXiv **2312.17118**, authored by **Liu et al.**):
+
+```
+@article{liu2023fully,
+  title={Fully sparse 3d panoptic occupancy prediction},
+  author={Liu, Haisong and Wang, Haiguang and Chen, Yang and Yang, Zetong and Zeng, Jia and Chen, Li and Wang, Limin},
+  journal={arXiv preprint arXiv:2312.17118},
+  year={2023}
+}
+```
+
 ## Highlights
 
 **New model**:1st_place_medal:: SparseOcc initially reconstructs a sparse 3D representation from visual inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries.
@@ -18,21 +30,34 @@ This is the official PyTorch implementation for our paper:
 
 ![](asserts/rayiou.jpg)
 
+Some FAQs from the community about the evaluation metrics:
+
+1. **Why does training with visible masks result in significant improvements in the old mIoU metric, but not in the new RayIoU metric?** As mentioned in the paper, when using the visible mask during training, the area behind the surface won't be supervised, so the model tends to fill this area with duplicated predictions, leading to a thicker surface. The old metric inconsistently penalizes along the depth axis when the prediction has a thick surface. Thus, this ''imporovement'' is mainly due to the vulnerability of old metric.
+2. **Why SparseOcc cannot exploit the vulnerability of the old metrics?** As SparseOcc employs a fully sparse architecture, it always predicts a thin surface. Thus, there are two ways for a fair comparison: (a) use the old metric, but all methods must predict a thin surface, which implies they cannot use the visible mask during training; (b) use RayIoU, as it is more reasonable and can fairly compare thick or thin surface. Our method achieves SOTA performance on both cases.
+3. **Does RayIoU overlook interior reconstruction?** Firstly, we are unable to obtain the interior occupancy ground-truth. This is because the ground-truth is derived from voxelizing LiDAR point clouds, and LiDARs are only capable of scanning the thin surface of an object. Secondly, the query ray in RayIoU can originate from any position within the scene (see the figure above). This allows it to evaluate the overall reconstruction performance, unlike depth estimation. We would like to emphasize that the evaluation logic of RayIoU aligns with the process of ground-truth generation.
+
+If you have other questions, feel free to contact me (Haisong Liu, [email protected]).
+
 ## News
 
-* 2024-05-29: We add support for [OpenOcc v2](configs/r50_nuimg_704x256_8f_openocc.py) dataset (without occupancy flow).
-* 2024-04-11: The panoptic version of SparseOcc ([configs/r50_nuimg_704x256_8f_pano.py](configs/r50_nuimg_704x256_8f_pano.py)) is released.
-* 2024-04-09: An updated arXiv version [https://arxiv.org/abs/2312.17118v3](https://arxiv.org/abs/2312.17118v3) has been released.
-* 2024-03-31: We release the code and pretrained weights.
-* 2023-12-30: We release the paper.
+* **2024-06-27**: SparseOcc v1.1 is released. In this change, we introduce BEV data augmentation (BDA) and Lovasz-Softmax loss to further enhance the performance. Compared with [v1.0](https://github.com/MCG-NJU/SparseOcc/tree/v1.0) (35.0 RayIoU with 48 epochs), SparseOcc v1.1 can achieve 36.8 RayIoU with 24 epochs!
+* **2024-05-29**: We add support for [OpenOcc v2](configs/r50_nuimg_704x256_8f_openocc.py) dataset (without occupancy flow).
+* **2024-04-11**: The panoptic version of SparseOcc ([configs/r50_nuimg_704x256_8f_pano.py](configs/r50_nuimg_704x256_8f_pano.py)) is released.
+* **2024-04-09**: An updated arXiv version [https://arxiv.org/abs/2312.17118v3](https://arxiv.org/abs/2312.17118v3) has been released.
+* **2024-03-31**: We release the code and pretrained weights.
+* **2023-12-30**: We release the paper.
 
 ## Model Zoo
 
-| Setting  | Pretrain | Training Cost | RayIoU | RayPQ | FPS | Weights |
+These results are from our latest version, v1.1, which outperforms the results reported in the paper. Additionally, our implementation differs slightly from the original paper. If you wish to reproduce the paper exactly, please refer to the [v1.0](https://github.com/MCG-NJU/SparseOcc/tree/v1.0) tag.
+
+| Setting  | Epochs | Training Cost | RayIoU | RayPQ | FPS | Weights |
 |----------|:--------:|:-------------:|:------:|:-----:|:---:|:-------:|
-| [r50_nuimg_704x256_8f](configs/r50_nuimg_704x256_8f.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth) | 1d4h, ~12GB Memory | 35.0 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.0/sparseocc_r50_nuimg_704x256_8f.pth) |
-| [r50_nuimg_704x256_8f_pano](configs/r50_nuimg_704x256_8f_pano.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth) | 1d4h, ~12GB Memory | 34.5 | 14.0 | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.0/sparseocc_r50_nuimg_704x256_8f_pano.pth) |
+| [r50_nuimg_704x256_8f](configs/r50_nuimg_704x256_8f.py) | 24 | 15h, ~12GB | 36.8 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_24e_v1.1.pth) |
+| [r50_nuimg_704x256_8f_60e](configs/r50_nuimg_704x256_8f_60e.py) | 60 | 37h, ~12GB | 37.7 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_60e_v1.1.pth) |
+| [r50_nuimg_704x256_8f_pano](configs/r50_nuimg_704x256_8f_pano.py) | 24 | 15h, ~12GB | 35.9 | 14.0 | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_pano_24e_v1.1.pth) |
 
+* The backbone is pretrained on [nuImages](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth). Download the weights to `pretrain/xxx.pth` before you start training.
 * FPS is measured with Intel(R) Xeon(R) Platinum 8369B CPU and NVIDIA A100-SXM4-80GB GPU (PyTorch `fp32` backend, including data loading).
 * We will release more settings in the future.
 
@@ -48,14 +73,6 @@ conda activate sparseocc
 conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia
 ```
 
-or PyTorch 1.10.2 + CUDA 10.2 for older GPUs:
-
-```
-conda create -n sparseocc python=3.8
-conda activate sparseocc
-conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=10.2 -c pytorch
-```
-
 Install other dependencies:
 
 ```

diff --git a/configs/r50_nuimg_704x256_8f.py b/configs/r50_nuimg_704x256_8f.py
@@ -37,21 +37,18 @@
 _dim_ = 256
 _num_points_ = 4
 _num_groups_ = 4
-_num_layers_ = 4
+_num_layers_ = 2
 _num_frames_ = 8
 _num_queries_ = 100
 _topk_training_ = [4000, 16000, 64000]
 _topk_testing_ = [2000, 8000, 32000]
-_topk_training_ = _topk_testing_
-
 
 model = dict(
     type='SparseOcc',
     data_aug=dict(
         img_color_aug=True,  # Move some augmentations to GPU
         img_norm_cfg=img_norm_cfg,
         img_pad_cfg=dict(size_divisor=32)),
-    use_grid_mask=False,
     use_mask_camera=False,
     img_backbone=dict(
         type='ResNet',
@@ -97,6 +94,16 @@
                 loss_mask_weight=5.0,
                 loss_dice_weight=5.0,
             ),
+            loss_geo_scal=dict(
+                type='GeoScalLoss',
+                num_classes=len(occ_class_names),
+                loss_weight=1.0
+            ),
+            loss_sem_scal=dict(
+                type='SemScalLoss',
+                num_classes=len(occ_class_names),
+                loss_weight=1.0
+            )
         ),
     ),
 )
@@ -107,12 +114,20 @@
     'bot_pct_lim': (0.0, 0.0),
     'rot_lim': (0.0, 0.0),
     'H': 900, 'W': 1600,
-    'rand_flip': False,
+    'rand_flip': True,
 }
 
+bda_aug_conf = dict(
+    rot_lim=(-22.5, 22.5),
+    scale_lim=(1., 1.),
+    flip_dx_ratio=0.5,
+    flip_dy_ratio=0.5
+)
+
 train_pipeline = [
     dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
     dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
+    dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=True),
     dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
     dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
     dict(type='DefaultFormatBundle3D', class_names=det_class_names),
@@ -123,6 +138,7 @@
 test_pipeline = [
     dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
     dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
+    dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=False),
     dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
     dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
     dict(type='DefaultFormatBundle3D', class_names=det_class_names),
@@ -140,7 +156,8 @@
         pipeline=train_pipeline,
         classes=det_class_names,
         modality=input_modality,
-        test_mode=False),
+        test_mode=False
+    ),
     val=dict(
         type=dataset_type,
         data_root=dataset_root,
@@ -149,7 +166,8 @@
         pipeline=test_pipeline,
         classes=det_class_names,
         modality=input_modality,
-        test_mode=True),
+        test_mode=True
+    ),
     test=dict(
         type=dataset_type,
         data_root=dataset_root,
@@ -158,12 +176,13 @@
         pipeline=test_pipeline,
         classes=det_class_names,
         modality=input_modality,
-        test_mode=True),
+        test_mode=True
+    ),
 )
 
 optimizer = dict(
     type='AdamW',
-    lr=2e-4,
+    lr=5e-4,
     paramwise_cfg=dict(
         custom_keys={
             'img_backbone': dict(lr_mult=0.1),
@@ -174,13 +193,15 @@
 optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
 
 lr_config = dict(
-    policy='CosineAnnealing',
+    policy='step',
     warmup='linear',
     warmup_iters=500,
     warmup_ratio=1.0 / 3,
-    min_lr_ratio=1e-3
+    by_epoch=True,
+    step=[22, 24],
+    gamma=0.2
 )
-total_epochs = 48
+total_epochs = 24
 batch_size = 8
 
 # load pretrained weights

diff --git a/configs/r50_nuimg_704x256_8f_60e.py b/configs/r50_nuimg_704x256_8f_60e.py
@@ -0,0 +1,15 @@
+_base_ = ['./r50_nuimg_704x256_8f.py']
+
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=500,
+    warmup_ratio=1.0 / 3,
+    by_epoch=True,
+    step=[48, 60],
+    gamma=0.2
+)
+total_epochs = 60
+
+# evaluation
+eval_config = dict(interval=total_epochs)
diff --git a/configs/r50_nuimg_704x256_8f_openocc.py b/configs/r50_nuimg_704x256_8f_openocc.py
@@ -62,11 +62,14 @@
     workers_per_gpu=8,
     train=dict(
         pipeline=train_pipeline,
-        occ_gt_root=occ_gt_root),
+        occ_gt_root=occ_gt_root
+    ),
     val=dict(
         pipeline=test_pipeline,
-        occ_gt_root=occ_gt_root),
+        occ_gt_root=occ_gt_root
+    ),
     test=dict(
         pipeline=test_pipeline,
-        occ_gt_root=occ_gt_root),
+        occ_gt_root=occ_gt_root
+    ),
 )
diff --git a/configs/r50_nuimg_704x256_8f_pano.py b/configs/r50_nuimg_704x256_8f_pano.py
@@ -29,12 +29,20 @@
     'bot_pct_lim': (0.0, 0.0),
     'rot_lim': (0.0, 0.0),
     'H': 900, 'W': 1600,
-    'rand_flip': False,
+    'rand_flip': True,
 }
 
+bda_aug_conf = dict(
+    rot_lim=(-22.5, 22.5),
+    scale_lim=(1., 1.),
+    flip_dx_ratio=0.5,
+    flip_dy_ratio=0.5
+)
+
 train_pipeline = [
     dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
     dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
+    dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=True),
     dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names), inst_class_ids=[2, 3, 4, 5, 6, 7, 9, 10]),
     dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
     dict(type='DefaultFormatBundle3D', class_names=det_class_names),
@@ -45,6 +53,7 @@
 test_pipeline = [
     dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
     dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
+    dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=False),
     dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names), inst_class_ids=[2, 3, 4, 5, 6, 7, 9, 10]),
     dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
     dict(type='DefaultFormatBundle3D', class_names=det_class_names),
@@ -56,11 +65,14 @@
     workers_per_gpu=8,
     train=dict(
         pipeline=train_pipeline,
-        occ_gt_root=occ_gt_root),
+        occ_gt_root=occ_gt_root
+    ),
     val=dict(
         pipeline=test_pipeline,
-        occ_gt_root=occ_gt_root),
+        occ_gt_root=occ_gt_root
+    ),
     test=dict(
         pipeline=test_pipeline,
-        occ_gt_root=occ_gt_root),
+        occ_gt_root=occ_gt_root
+    ),
 )
diff --git a/loaders/ego_pose_dataset.py b/loaders/ego_pose_dataset.py
@@ -12,6 +12,7 @@ def trans_matrix(T, R):
     return tm
 
 
+# A helper dataset for RayIoU. It is NOT used during training.
 class EgoPoseDataset(Dataset):
     def __init__(self, data_infos):
         super(EgoPoseDataset, self).__init__()

diff --git a/loaders/nuscenes_occ_dataset.py b/loaders/nuscenes_occ_dataset.py
@@ -137,7 +137,7 @@ def evaluate(self, occ_results, runner=None, show_dir=None, **eval_kwargs):
             occ_loc = torch.from_numpy(occ_pred['occ_loc'].astype(np.int64))  # [B, N, 3]
 
             data_type = self.occ_gt_root.split('/')[-1]
-            if data_type == 'occ3d':
+            if data_type == 'occ3d' or data_type == 'occ3d_panoptic':
                 occ_class_names = occ3d_class_names
             elif data_type == 'openocc_v2':
                 occ_class_names = openocc_class_names