release v1

isl-org · Mar 22, 2023 · bc8e813 · bc8e813
1 parent 314ca80
commit bc8e813
Show file tree

Hide file tree

Showing 22 changed files with 1,896 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,134 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# I/O
+input/
+output/
+weights/
diff --git a/README.md b/README.md
@@ -0,0 +1,130 @@
+# Monocular Visual-Inertial Depth Estimation
+
+This repository contains code and models for our paper:
+
+> Monocular Visual-Inertial Depth Estimation  
+> Diana Wofk, René Ranftl, Matthias Müller, Vladlen Koltun
+
+## Introduction
+
+![Methodology Diagram](figures/methodology_diagram.png)
+
+We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach consists of three stages: (1) input processing, where RGB and IMU data feed into monocular depth estimation alongside visual-inertial odometry, (2) global scale and shift alignment, where monocular depth estimates are fitted to sparse depth from VIO in a least-squares manner, and (3) learning-based dense scale alignment, where globally-aligned depth is locally realigned using a dense scale map regressed by the ScaleMapLearner (SML). The images at the bottom in the diagram above illustrate a VOID sample being processed through our pipeline; from left to right: the input RGB, ground truth depth, sparse depth from VIO, globally-aligned depth, scale map scaffolding, dense scale map regressed by SML, final depth output.
+
+![Teaser Figure](figures/teaser_figure.png)
+
+## Setup
+
+1) Setup dependencies:
+
+    ```shell
+    conda env create -f environment.yaml
+    conda activate vi-depth
+    ```
+
+2) Pick one or more ScaleMapLearner (SML) models and download the corresponding weights to the `weights` folder.
+
+    | Depth Predictor   |  SML on VOID 150  |  SML on VOID 500  | SML on VOID 1500 |
+    | :---              |       :----:      |       :----:      |      :----:      |
+    | DPT-BEiT-Large    | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.1500.ckpt) |
+    | DPT-SwinV2-Large  | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.1500.ckpt) |
+    | DPT-Large         | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.1500.ckpt) |
+    | DPT-Hybrid        | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.ckpt)* | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.1500.ckpt) |
+    | DPT-SwinV2-Tiny   | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.1500.ckpt) |
+    | DPT-LeViT         | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.1500.ckpt) |
+    | MiDaS-small       | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.1500.ckpt) |
+
+    *Also available with pretraining on TartanAir: [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.pretrained.ckpt)
+
+## Inference
+
+1) Place inputs into the `input` folder. An input image and corresponding sparse metric depth map are expected:
+
+    ```bash
+    input
+    ├── image                   # RGB image
+    │   ├── <timestamp>.png
+    │   └── ...
+    └── sparse_depth            # sparse metric depth map
+        ├── <timestamp>.png     # as 16b PNG
+        └── ...
+    ```
+
+    The `load_sparse_depth` function in `run.py` may need to be modified depending on the format in which sparse depth is stored. By default, the depth storage method [used in the VOID dataset](https://github.com/alexklwong/void-dataset/blob/master/src/data_utils.py) is assumed.
+
+2) Run the `run.py` script as follows:
+
+    ```bash
+    DEPTH_PREDICTOR="dpt_beit_large_512"
+    NSAMPLES=150
+    SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt"
+
+    python run.py -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH --save-output
+    ```
+
+3) The `--save-output` flag enables saving outputs to the `output` folder. By default, the following outputs will be saved per sample:
+
+    ```bash
+    output
+    ├── ga_depth                # metric depth map after global alignment
+    │   ├── <timestamp>.pfm     # as PFM
+    │   ├── <timestamp>.png     # as 16b PNG
+    │   └── ...
+    └── sml_depth               # metric depth map output by SML
+        ├── <timestamp>.pfm     # as PFM
+        ├── <timestamp>.png     # as 16b PNG
+        └── ...
+    ```
+
+## Evaluation
+
+Models provided in this repo were trained on the VOID dataset. 
+1) Download the VOID dataset following [the instructions in the VOID dataset repo](https://github.com/alexklwong/void-dataset#downloading-void).
+2) To evaluate on VOID test sets, run the `evaluate.py` script as follows:
+
+    ```bash
+    DATASET_PATH="/path/to/void_release/"
+
+    DEPTH_PREDICTOR="dpt_beit_large_512"
+    NSAMPLES=150
+    SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt"
+
+    python evaluate.py -ds $DATASET_PATH -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH
+    ```
+
+    Results for the example shown above:
+
+    ```
+    Averaging metrics for globally-aligned depth over 800 samples
+    Averaging metrics for SML-aligned depth over 800 samples
+    +---------+----------+----------+
+    |  metric | GA Only  |  GA+SML  |
+    +---------+----------+----------+
+    |   RMSE  |  191.36  |  142.85  |
+    |   MAE   |  115.84  |   76.95  |
+    |  AbsRel |    0.069 |    0.046 |
+    |  iRMSE  |   72.70  |   57.13  |
+    |   iMAE  |   49.32  |   34.25  |
+    | iAbsRel |    0.071 |    0.048 |
+    +---------+----------+----------+
+    ```
+
+    To evaluate on VOID test sets at different densities (void_150, void_500, void_1500), change the `NSAMPLES` argument above accordingly.
+
+## Citation
+
+If you reference our work, please consider citing the following:
+
+```bib
+@inproceedings{wofk2023videpth,
+    author      = {{Wofk, Diana and Ranftl, Ren\'{e} and M{\"u}ller, Matthias and Koltun, Vladlen}},
+    title       = {{Monocular Visual-Inertial Depth Estimation}},
+    booktitle   = {{IEEE International Conference on Robotics and Automation (ICRA)}},
+    year        = {{2023}}
+}
+```
+
+## Acknowledgements
+
+Our work builds on and uses code from [MiDaS](https://github.com/isl-org/MiDaS), [timm](https://github.com/rwightman/pytorch-image-models), and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/). We'd like to thank the authors for making these libraries and frameworks available.
+
diff --git a/environment.yaml b/environment.yaml
@@ -0,0 +1,18 @@
+name: vi-depth
+channels:
+  - pytorch
+  - defaults
+dependencies:
+  - nvidia::cudatoolkit=11.7
+  - python=3.10.8
+  - pytorch::pytorch=1.13.0
+  - torchvision=0.14.0
+  - pip=22.3.1
+  - numpy=1.23.4
+  - pip:
+    - opencv-python==4.6.0.66
+    - scipy==1.10.1
+    - timm==0.6.12
+    - pytorch-lightning==1.9.0
+    - imageio==2.25.0
+    - prettytable==3.6.0