Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
qiruiw committed Dec 16, 2023
0 parents commit 09ce49e
Show file tree
Hide file tree
Showing 90 changed files with 92,762 additions and 0 deletions.
155 changes: 155 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# custom
.DS_Store
.vscode

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
# dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
cub-1.10.0
pytorch3d

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# dataset
data/pix3d/*.json
data/pix3d/data
data/pix3d/*.h5

data/moos/moos*

data/scan2cad/metadata

# output
preprocess/pix3d/checkpoints
preprocess/moos/example
output
lightning_logs
scripts/cedar/*.out
*.out
temp
demo
runs
154 changes: 154 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects

Qirui Wu, [Daniel Ritchie](https://dritchie.github.io/), [Manolis Savva](https://msavva.github.io/), [Angel Xuan Chang](http://angelxuanchang.github.io/)

[[Paper](https://github.com/3dlg-hcvc/generalizing_shape_retrieval), [Project Page](https://github.com/3dlg-hcvc/generalizing_shape_retrieval), [Dataset](https://github.com/3dlg-hcvc/generalizing_shape_retrieval)]

<!-- ![](docs/images/teaser.png) -->
<p><img src="docs/images/teaser.png" width="65%"></p>

Official repository of the paper [Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects](https://github.com/3dlg-hcvc/generalizing_shape_retrieval). We systematically study the generalization of single-view 3D shape retrieval along three different axes: the presence of object occlusions and truncations, generalization to unseen 3D shape data, and generalization to unseen objects in the input images.


## Setup
The environment is tested with Python 3.8, PyTorch 2.0, CUDA 11.7, PyTorch3D 0.7.3, Lightning 2.0.1.

```bash
conda create -n gcmic python=3.8
conda activate gcmic
pip3 install torch torchvision
pip install -r requirements.txt
conda install -c fvcore -c iopath -c bottler -c conda-forge fvcore iopath nvidiacub
pip install "git+https://github.com/facebookresearch/[email protected]"
```


## Data

### MOOS

<p><img src="docs/images/moos_generation.png" width="100%"></p>

Multi-Object Occlusion Scenes (MOOS) is generated using a heuristic algorithm that iteratively places newly sampled 3D shapes from [3D-FUTURE](https://tianchi.aliyun.com/specials/promotion/alibaba-3d-future) into the existing layout. Download MOOS **raw** and **preprocessed** data with the following command and extract/place them at `./data/moos`.
```sh
cd data/moos && sh download.sh
```
The data files should be organized as follows:
```shell
gcmic
├── data
│ ├── moos
│ │ ├── scenes # raw image data
│ │ │ ├── <scene_name>
│ │ │ │ ├── rgb
│ │ │ │ │ ├── rgb_<view_id>.rgb.png
│ │ │ │ ├── instances
│ │ │ │ │ ├── instances_<view_id>.rgb.png
│ │ │ │ ├── objects
│ │ │ │ │ ├── <obj_id>_<view_id>.rgb.png
│ │ │ │ │ ├── <obj_id>_<view_id>.mask.png
│ │ │ │ ├── depth
│ │ │ │ ├── normal
│ │ │ │ ├── layout2d.png # top-down view
│ │ │ │ ├── scene.json # scene metadata
│ │ ├── moos_annotation.txt
│ │ ├── moos_annotation_all.txt
│ │ ├── moos_annotation_no_occ.txt # annotation file containing object queries w/o occlusions
│ │ ├── moos_annotation_occ.txt # annotation file containing object queries w/ occlusions
│ │ ├── moos_1k.h5 # image queries
│ │ ├── moos_mv.h5 # multiviews for each shape
│ │ ├── moos_obj.h5 # pointcloud for each shape
│ │ ├── lfd_200.h5 # 200-view LFD for each shape
│ │ ├── moos_pose.json # object pose info for rendering
│ │ ├── ...
```

Please refer to `./preprocess/moos/gen_dataset_hdf5.py`, `./preprocess/3dfuture/get_all_lfd.py` and `./preprocess/moos/extract_pose_json.py` for how to prepare preprocessed data. Please refer to [3D-FUTURE](https://tianchi.aliyun.com/specials/promotion/alibaba-3d-future) for downloading 3D shapes if you want to render your own shape multiviews and LFDs. Put 3D-FUTURE data under `./data/3dfuture`.

We generate 10K scenes with the script `./preprocess/moos/render_scenes.py`. Also note that we can reconstruct each scene by reading meta information from `scene.json` (run `./preprocess/moos/reconstruct_scenes.py`). You can explore more demos of how to generate random scenes in `./notebook`.

### Pix3D

Download Pix3D raw data [here](http://pix3d.csail.mit.edu/), and preprocessed data with the following command and extract/place them at `./data/pix3d`.
```sh
cd data/pix3d && sh download.sh
```
Please refer to [details](./data/README.md#pix3d) for Pix3D data structure.

### Scan2CAD

Download ScanNet25K images and CAD annotations from [ROCA data](https://github.com/cangumeli/ROCA#downloading-processed-data-recommended), and preprocessed data with the following command and extract/place them at `./data/scan2cad`.
```sh
cd data/scan2cad && sh download.sh
```
Please refer to [details](./data/README.md#scan2cad) for Scan2CAD data structure. Download ShapeNet 3D shapes [here](https://shapenet.org/) if you want to render your own shape multiviews and LFDs.


## Train

Train a CMIC model on the ALL set of MOOS.
```sh
python train.py -t train -e cmic_moos --data_conf conf/dataset/moos.yaml --model_conf conf/model/cmic.yaml --epochs 50 --batch_size 64 --num_views 12 --verbose False --annotation_file moos_annotation_all.txt --use_crop --use_1k_img
```

Train a CMIC model on the ALL set of Pix3D using Mask2Former predicted object masks.
```sh
python train.py -t train -e cmic_pix3d --data_conf conf/dataset/pix3d.yaml --model_conf conf/model/cmic.yaml --epochs 500 --batch_size 64 --num_views 12 --verbose False --annotation_file pix3d_annotation_all.txt --mask_source m2f_mask --val_check_interval 1 --use_crop
```

Train a CMIC model on Scan2CAD.
```sh
python train.py -t train -e cmic_scan2cad --data_conf conf/dataset/scan2cad.yaml --model_conf conf/model/cmic.yaml --epochs 500 --batch_size 64 --num_views 12 --verbose False --annotation_file scan2cad_annotation.txt --val_check_interval 1 --num_sanity_val_steps 100 --use_crop --use_480p_img --center_in_image
```


## Fine-tune

Fine-tune `cmic_moos` on Pix3D
```sh
python train.py -t finetune -e cmic_moos_ft_pix3d --data_conf conf/dataset/pix3d.yaml --model_conf conf/model/cmic.yaml --epochs 5 --batch_size 64 --num_views 12 --verbose False --annotation_file pix3d_annotation_all.txt --mask_source m2f_mask --ckpt_path ./output/moos/cmic/cmic_moos/train/model.ckpt --val_check_interval 1 --use_crop
```

Fine-tune `cmic_moos` on Scan2CAD
```sh
python train.py -t finetune -e cmic_moos_ft_scan2cad --data_conf conf/dataset/scan2cad.yaml --model_conf conf/model/cmic.yaml --epochs 5 --batch_size 64 --num_views 12 --verbose False --annotation_file scan2cad_annotation.txt --ckpt_path ./output/moos/cmic/cmic_moos/train/model.ckpt --val_check_interval 1 --use_crop --num_sanity_val_steps 400 --use_480p_img --center_in_image
```


## Evaluation

We first embed all shape multiviews from different datasets (MOOS, Pix3D, and Scan2CAD) using the specified pretrained shape encoder.
```sh
python test.py -t embed_shape -e <model_name> --data_conf conf/dataset/<dataset>.yaml --model_conf conf/model/cmic.yaml --batch_size 48 --num_views 12 --verbose False --ckpt model.ckpt
```

Evaluate on `all|seen|unseen` objects of different **MOOS** sets `all|no_occ|occ`.
```sh
python test.py -t test -e cmic_moos --data_conf conf/dataset/moos.yaml --model_conf conf/model/cmic.yaml --verbose False --batch_size 48 --ckpt model.ckpt --annotation_file moos_annotation_<all|no_occ|occ>.txt --offline_evaluation --test_objects <all|seen|unseen> --use_crop --use_1k_img
```

Evaluate on `all|seen|unseen` objects of different **Pix3D** sets `all|easy|hard`.
```sh
python test.py -t test -e cmic_pix3d --data_conf conf/dataset/pix3d.yaml --model_conf conf/model/cmic.yaml --verbose False --batch_size 48 --mask_source m2f_mask --ckpt model.ckpt --annotation_file pix3d_annotation_<all|easy|hard>.txt --offline_evaluation --test_objects <all|seen|unseen> --use_crop
```

Evaluate on the **Scan2CAD** dataset.
```sh
python test.py -t test -e cmic_scan2cad --data_conf conf/dataset/scan2cad.yaml --model_conf conf/model/cmic.yaml --verbose False --batch_size 48 --ckpt model.ckpt --annotation_file scan2cad_annotation.txt --offline_evaluation --use_crop --use_480p_img
```

**Note**
- Add flags `--not_eval_acc --shape_feats_source <dataset>` to test on unseen 3D shapes.
- Add flag `--save_eval_vis` to save retrieved 3D shape renderings and visualizations.


## Bibtex
```
@article{wu2023generalizing,
author = {Wu, Qirui and Ritchie Daniel and Savva, Manolis and Chang, Angel.X},
title = {{Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects}},
year = {2023},
journal = {arXiv preprint arXiv:xxx}
}
```

42 changes: 42 additions & 0 deletions conf/dataset/moos.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
data:
name: moos

module: gcmic.dataset.moos
classname: MOOS
loader: moos_loader
task: train
split:

raw_path: ${DATA_PATH.moos.raw}
preprocessed_path: ${DATA_PATH.moos.preprocessed}
h5_path: ${DATA_PATH.moos.preprocessed}/moos.h5
mv_path: ${DATA_PATH.moos.preprocessed}/moos_mv.h5
obj_path: ${DATA_PATH.moos.preprocessed}/moos_obj.h5
lfd_path: ${DATA_PATH.future3d.preprocessed}/lfd_200.h5
pose_path: ${DATA_PATH.moos.preprocessed}/moos_pose.json
annotation_file: moos_annotation_all.txt

img_source: image
mask_source: mask
use_crop: False
use_color_transfer: False
batch_size: 64
num_workers: 8

cat_list: [chair, bed, sofa, table]
cat_choice: [chair]

input_dim: 224

multiview:
mv_dirname: neutral_multiviews_12
mv_num: 12
mv_dim: [224, 224]
mv_opt: crop

tour: 2
# random_model: False
test_only_occlusion: False
test_objects: all

unique_data_sampler: False
43 changes: 43 additions & 0 deletions conf/dataset/pix3d.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
data:
name: pix3d

module: gcmic.dataset.pix3d
classname: Pix3D
loader: pix3d_loader
task: train
split:

raw_path: ${DATA_PATH.pix3d.raw}
preprocessed_path: ${DATA_PATH.pix3d.preprocessed}
h5_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_224.h5
mv_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_mv.h5
obj_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_obj.h5
lfd_path: ${DATA_PATH.pix3d.preprocessed}/lfd_200.h5
raw_img_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_img_path.txt
pose_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_pose.json
annotation_file: pix3d_annotation_all.txt

img_source: image
mask_source: mask
use_crop: False
use_color_transfer: False
batch_size: 64
num_workers: 8

cat_list: [chair, bed, desk, sofa, bookcase, table, wardrobe, tool, misc]
cat_choice: [chair]

input_dim: 224

multiview:
mv_dirname: neutral_multiviews_12
mv_num: 12
mv_dim: [224, 224]
mv_opt: crop

tour: 2
# random_model: False
test_only_occlusion: False
test_objects: all

unique_data_sampler: False
Loading

0 comments on commit 09ce49e

Please sign in to comment.