R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [ICCV 2023]

Project Page | Paper | Data

Abstract

Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We propose R3D3, a multi-camera system for dense 3D reconstruction and ego-motion estimation. Our approach iterates between geometric estimation that exploits spatial-temporal information from multiple cameras, and monocular depth refinement. We integrate multi-camera feature correlation and dense bundle adjustment operators that yield robust geometric depth and pose estimates. To improve reconstruction where geometric depth is unreliable, e.g. for moving objects or low-textured regions, we introduce learnable scene priors via a depth refinement network. We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments. Consequently, we achieve state-of-the-art dense depth prediction on the DDAD and nuScenes benchmarks.

Getting Started

Clone the repo using the --recursive flag

git clone --recurse-submodules https://github.com/AronDiSc/r3d3.git
cd r3d3

Creating a new anaconda environment using the provided .yaml file

conda env create --file environment.yaml
conda activate r3d3

Compile the extensions (takes about 10 minutes)

python setup.py install

Datasets

The datasets should be placed at data/datasets/<dataset>

DDAD

Download the DDAD dataset and place it at data/datasets/DDAD. We use the masks provided by SurroundDepth. Place them at data/datasets/DDAD/<scene>/occl_mask/<cam>/mask.png. The DDAD datastructure should look as follows:

R3D3
    ├ data
        ├ datasets
            ├ DDAD
                ├ <scene>
                    ├ calibration
                        └ ....json
                    ├ point_cloud
                        └ <cam>
                            └ ....npz
                    ├ occl_mask
                        └ <cam>
                            └ ....png
                    ├ rgb
                        └ <cam>
                            └ ....png
                    
                    └ scene_....json
                └ ...
            └ ...
        └ ...
    └ ...

nuScenes

Download the nuScenes dataset and place it at data/datasets/nuScenes. We use the provide self-occlusion masks. Place them at data/datasets/nuScenes/mask/<cam>.png. The nuScenes datastructure should look as follows:

R3D3
    ├ data
        ├ datasets
            ├ nuScenes
                ├ mask
                    ├ CAM_....png
                ├ samples
                    ├ CAM_...
                        └ ....jpg
                    └ LIDAR_TOP
                        └ ....pcd.bin
                ├ sweeps
                    ├ CAM_...
                        └ ....jpg
                ├  v1.0-trainval
                    └ ...
                └ ...
            └ ...
        └ ...
    └ ...

Models

VKITTI2 Finetuned Feature-Matching

Download the weights for the feature- and context-encoders as well as the GRU from here: r3d3_finetuned.ckpt. Place it at:

R3D3
    ├ data
        ├ models
            ├ r3d3
                └ r3d3_finetuned.ckpt
            └ ...
        └ ...
    └ ...

Completion Network

We provide completion network weights for the DDAD and nuScenes datasets.

Dataset	Abs Rel	Sq Rel	RMSE	delta < 1.25	Download
DDAD	0.162	3.019	11.408	0.811	completion_ddad.ckpt
nuScenes	0.253	4.759	7.150	0.729	completion_nuscenes.ckpt

Place them at:

R3D3
    ├ data
        ├ models
            ├ completion
                ├ completion_ddad.ckpt
                └ completion_nuscenes.ckpt
            └ ...
        └ ...
    └ ...

Training

Droid-SLAM Finetuning

We finetune the provided droid.pth checkpoint on VKITTI2 by using the Droid-SLAM code-base.

Completion Network

1. Generate Training Data

# DDAD
python evaluate.py \
    --config configs/evaluation/dataset_generation/dataset_generation_ddad.yaml \
    --r3d3_weights=data/models/r3d3/r3d3_finetuned.ckpt \
    --r3d3_image_size 384 640 \
    --r3d3_n_warmup=5 \
    --r3d3_optm_window=5 \
    --r3d3_corr_impl=lowmem \
    --r3d3_graph_type=droid_slam \
    --training_data_path=./data/datasets/DDAD 

# nuScenes
python evaluate.py \
    --config configs/evaluation/dataset_generation/dataset_generation_nuscenes.yaml \
    --r3d3_weights=data/models/r3d3/r3d3_finetuned.ckpt \
    --r3d3_image_size 448 768 \
    --r3d3_n_warmup=5 \
    --r3d3_optm_window=5 \
    --r3d3_corr_impl=lowmem \
    --r3d3_graph_type=droid_slam \
    --training_data_path=./data/datasets/nuScenes

2. Completion Network Training

# DDAD
python train.py configs/training/depth_completion/r3d3_completion_ddad_stage_1.yaml
python train.py configs/evaluation/depth_completion/r3d3_completion_ddad_inf_depth.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
python train.py configs/training/depth_completion/r3d3_completion_ddad_stage_2.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt

# nuScenes
python train.py configs/training/depth_completion/r3d3_completion_nuscenes_stage_1.yaml
python train.py configs/evaluation/depth_completion/r3d3_completion_nuscenes_inf_depth.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
python train.py configs/training/depth_completion/r3d3_completion_nuscenes_stage_2.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt

Evaluation

# DDAD
python evaluate.py \
    --config configs/evaluation/r3d3/r3d3_evaluation_ddad.yaml \
    --r3d3_weights data/models/r3d3/r3d3_finetuned.ckpt \
    --r3d3_image_size 384 640 \
    --r3d3_init_motion_only \
    --r3d3_n_edges_max=84 

# nuScenes
python evaluate.py \
    --config configs/evaluation/r3d3/r3d3_evaluation_nuscenes.yaml \
    --r3d3_weights data/models/r3d3/r3d3_finetuned.ckpt \
    --r3d3_image_size 448 768 \
    --r3d3_init_motion_only \
    --r3d3_dt_inter=0 \
    --r3d3_n_edges_max=72

Citation

If you find the code helpful in your research or work, please cite the following paper.

@inproceedings{r3d3,
  title={R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras},
  author={Schmied, Aron and Fischer, Tobias and Danelljan, Martin and Pollefeys, Marc and Yu, Fisher},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2023}
}

Acknowledgements

This repository is based on Droid-SLAM.
The implementation of the completion network is based on Monodepth2.
The vidar framework is used for training, evaluation and logging results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [ICCV 2023]

Project Page | Paper | Data

Abstract

Getting Started

Datasets

DDAD

nuScenes

Models

VKITTI2 Finetuned Feature-Matching

Completion Network

Training

Droid-SLAM Finetuning

Completion Network

1. Generate Training Data

2. Completion Network Training

Evaluation

Citation

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [ICCV 2023]

Project Page | Paper | Data

Abstract

Getting Started

Datasets

DDAD

nuScenes

Models

VKITTI2 Finetuned Feature-Matching

Completion Network

Training

Droid-SLAM Finetuning

Completion Network

1. Generate Training Data

2. Completion Network Training

Evaluation

Citation

Acknowledgements