-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 09ce49e
Showing
90 changed files
with
92,762 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
# custom | ||
.DS_Store | ||
.vscode | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
# dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
cub-1.10.0 | ||
pytorch3d | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# dataset | ||
data/pix3d/*.json | ||
data/pix3d/data | ||
data/pix3d/*.h5 | ||
|
||
data/moos/moos* | ||
|
||
data/scan2cad/metadata | ||
|
||
# output | ||
preprocess/pix3d/checkpoints | ||
preprocess/moos/example | ||
output | ||
lightning_logs | ||
scripts/cedar/*.out | ||
*.out | ||
temp | ||
demo | ||
runs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
# Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects | ||
|
||
Qirui Wu, [Daniel Ritchie](https://dritchie.github.io/), [Manolis Savva](https://msavva.github.io/), [Angel Xuan Chang](http://angelxuanchang.github.io/) | ||
|
||
[[Paper](https://github.com/3dlg-hcvc/generalizing_shape_retrieval), [Project Page](https://github.com/3dlg-hcvc/generalizing_shape_retrieval), [Dataset](https://github.com/3dlg-hcvc/generalizing_shape_retrieval)] | ||
|
||
<!-- ![](docs/images/teaser.png) --> | ||
<p><img src="docs/images/teaser.png" width="65%"></p> | ||
|
||
Official repository of the paper [Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects](https://github.com/3dlg-hcvc/generalizing_shape_retrieval). We systematically study the generalization of single-view 3D shape retrieval along three different axes: the presence of object occlusions and truncations, generalization to unseen 3D shape data, and generalization to unseen objects in the input images. | ||
|
||
|
||
## Setup | ||
The environment is tested with Python 3.8, PyTorch 2.0, CUDA 11.7, PyTorch3D 0.7.3, Lightning 2.0.1. | ||
|
||
```bash | ||
conda create -n gcmic python=3.8 | ||
conda activate gcmic | ||
pip3 install torch torchvision | ||
pip install -r requirements.txt | ||
conda install -c fvcore -c iopath -c bottler -c conda-forge fvcore iopath nvidiacub | ||
pip install "git+https://github.com/facebookresearch/[email protected]" | ||
``` | ||
|
||
|
||
## Data | ||
|
||
### MOOS | ||
|
||
<p><img src="docs/images/moos_generation.png" width="100%"></p> | ||
|
||
Multi-Object Occlusion Scenes (MOOS) is generated using a heuristic algorithm that iteratively places newly sampled 3D shapes from [3D-FUTURE](https://tianchi.aliyun.com/specials/promotion/alibaba-3d-future) into the existing layout. Download MOOS **raw** and **preprocessed** data with the following command and extract/place them at `./data/moos`. | ||
```sh | ||
cd data/moos && sh download.sh | ||
``` | ||
The data files should be organized as follows: | ||
```shell | ||
gcmic | ||
├── data | ||
│ ├── moos | ||
│ │ ├── scenes # raw image data | ||
│ │ │ ├── <scene_name> | ||
│ │ │ │ ├── rgb | ||
│ │ │ │ │ ├── rgb_<view_id>.rgb.png | ||
│ │ │ │ ├── instances | ||
│ │ │ │ │ ├── instances_<view_id>.rgb.png | ||
│ │ │ │ ├── objects | ||
│ │ │ │ │ ├── <obj_id>_<view_id>.rgb.png | ||
│ │ │ │ │ ├── <obj_id>_<view_id>.mask.png | ||
│ │ │ │ ├── depth | ||
│ │ │ │ ├── normal | ||
│ │ │ │ ├── layout2d.png # top-down view | ||
│ │ │ │ ├── scene.json # scene metadata | ||
│ │ ├── moos_annotation.txt | ||
│ │ ├── moos_annotation_all.txt | ||
│ │ ├── moos_annotation_no_occ.txt # annotation file containing object queries w/o occlusions | ||
│ │ ├── moos_annotation_occ.txt # annotation file containing object queries w/ occlusions | ||
│ │ ├── moos_1k.h5 # image queries | ||
│ │ ├── moos_mv.h5 # multiviews for each shape | ||
│ │ ├── moos_obj.h5 # pointcloud for each shape | ||
│ │ ├── lfd_200.h5 # 200-view LFD for each shape | ||
│ │ ├── moos_pose.json # object pose info for rendering | ||
│ │ ├── ... | ||
``` | ||
|
||
Please refer to `./preprocess/moos/gen_dataset_hdf5.py`, `./preprocess/3dfuture/get_all_lfd.py` and `./preprocess/moos/extract_pose_json.py` for how to prepare preprocessed data. Please refer to [3D-FUTURE](https://tianchi.aliyun.com/specials/promotion/alibaba-3d-future) for downloading 3D shapes if you want to render your own shape multiviews and LFDs. Put 3D-FUTURE data under `./data/3dfuture`. | ||
|
||
We generate 10K scenes with the script `./preprocess/moos/render_scenes.py`. Also note that we can reconstruct each scene by reading meta information from `scene.json` (run `./preprocess/moos/reconstruct_scenes.py`). You can explore more demos of how to generate random scenes in `./notebook`. | ||
|
||
### Pix3D | ||
|
||
Download Pix3D raw data [here](http://pix3d.csail.mit.edu/), and preprocessed data with the following command and extract/place them at `./data/pix3d`. | ||
```sh | ||
cd data/pix3d && sh download.sh | ||
``` | ||
Please refer to [details](./data/README.md#pix3d) for Pix3D data structure. | ||
|
||
### Scan2CAD | ||
|
||
Download ScanNet25K images and CAD annotations from [ROCA data](https://github.com/cangumeli/ROCA#downloading-processed-data-recommended), and preprocessed data with the following command and extract/place them at `./data/scan2cad`. | ||
```sh | ||
cd data/scan2cad && sh download.sh | ||
``` | ||
Please refer to [details](./data/README.md#scan2cad) for Scan2CAD data structure. Download ShapeNet 3D shapes [here](https://shapenet.org/) if you want to render your own shape multiviews and LFDs. | ||
|
||
|
||
## Train | ||
|
||
Train a CMIC model on the ALL set of MOOS. | ||
```sh | ||
python train.py -t train -e cmic_moos --data_conf conf/dataset/moos.yaml --model_conf conf/model/cmic.yaml --epochs 50 --batch_size 64 --num_views 12 --verbose False --annotation_file moos_annotation_all.txt --use_crop --use_1k_img | ||
``` | ||
|
||
Train a CMIC model on the ALL set of Pix3D using Mask2Former predicted object masks. | ||
```sh | ||
python train.py -t train -e cmic_pix3d --data_conf conf/dataset/pix3d.yaml --model_conf conf/model/cmic.yaml --epochs 500 --batch_size 64 --num_views 12 --verbose False --annotation_file pix3d_annotation_all.txt --mask_source m2f_mask --val_check_interval 1 --use_crop | ||
``` | ||
|
||
Train a CMIC model on Scan2CAD. | ||
```sh | ||
python train.py -t train -e cmic_scan2cad --data_conf conf/dataset/scan2cad.yaml --model_conf conf/model/cmic.yaml --epochs 500 --batch_size 64 --num_views 12 --verbose False --annotation_file scan2cad_annotation.txt --val_check_interval 1 --num_sanity_val_steps 100 --use_crop --use_480p_img --center_in_image | ||
``` | ||
|
||
|
||
## Fine-tune | ||
|
||
Fine-tune `cmic_moos` on Pix3D | ||
```sh | ||
python train.py -t finetune -e cmic_moos_ft_pix3d --data_conf conf/dataset/pix3d.yaml --model_conf conf/model/cmic.yaml --epochs 5 --batch_size 64 --num_views 12 --verbose False --annotation_file pix3d_annotation_all.txt --mask_source m2f_mask --ckpt_path ./output/moos/cmic/cmic_moos/train/model.ckpt --val_check_interval 1 --use_crop | ||
``` | ||
|
||
Fine-tune `cmic_moos` on Scan2CAD | ||
```sh | ||
python train.py -t finetune -e cmic_moos_ft_scan2cad --data_conf conf/dataset/scan2cad.yaml --model_conf conf/model/cmic.yaml --epochs 5 --batch_size 64 --num_views 12 --verbose False --annotation_file scan2cad_annotation.txt --ckpt_path ./output/moos/cmic/cmic_moos/train/model.ckpt --val_check_interval 1 --use_crop --num_sanity_val_steps 400 --use_480p_img --center_in_image | ||
``` | ||
|
||
|
||
## Evaluation | ||
|
||
We first embed all shape multiviews from different datasets (MOOS, Pix3D, and Scan2CAD) using the specified pretrained shape encoder. | ||
```sh | ||
python test.py -t embed_shape -e <model_name> --data_conf conf/dataset/<dataset>.yaml --model_conf conf/model/cmic.yaml --batch_size 48 --num_views 12 --verbose False --ckpt model.ckpt | ||
``` | ||
|
||
Evaluate on `all|seen|unseen` objects of different **MOOS** sets `all|no_occ|occ`. | ||
```sh | ||
python test.py -t test -e cmic_moos --data_conf conf/dataset/moos.yaml --model_conf conf/model/cmic.yaml --verbose False --batch_size 48 --ckpt model.ckpt --annotation_file moos_annotation_<all|no_occ|occ>.txt --offline_evaluation --test_objects <all|seen|unseen> --use_crop --use_1k_img | ||
``` | ||
|
||
Evaluate on `all|seen|unseen` objects of different **Pix3D** sets `all|easy|hard`. | ||
```sh | ||
python test.py -t test -e cmic_pix3d --data_conf conf/dataset/pix3d.yaml --model_conf conf/model/cmic.yaml --verbose False --batch_size 48 --mask_source m2f_mask --ckpt model.ckpt --annotation_file pix3d_annotation_<all|easy|hard>.txt --offline_evaluation --test_objects <all|seen|unseen> --use_crop | ||
``` | ||
|
||
Evaluate on the **Scan2CAD** dataset. | ||
```sh | ||
python test.py -t test -e cmic_scan2cad --data_conf conf/dataset/scan2cad.yaml --model_conf conf/model/cmic.yaml --verbose False --batch_size 48 --ckpt model.ckpt --annotation_file scan2cad_annotation.txt --offline_evaluation --use_crop --use_480p_img | ||
``` | ||
|
||
**Note** | ||
- Add flags `--not_eval_acc --shape_feats_source <dataset>` to test on unseen 3D shapes. | ||
- Add flag `--save_eval_vis` to save retrieved 3D shape renderings and visualizations. | ||
|
||
|
||
## Bibtex | ||
``` | ||
@article{wu2023generalizing, | ||
author = {Wu, Qirui and Ritchie Daniel and Savva, Manolis and Chang, Angel.X}, | ||
title = {{Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects}}, | ||
year = {2023}, | ||
journal = {arXiv preprint arXiv:xxx} | ||
} | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
data: | ||
name: moos | ||
|
||
module: gcmic.dataset.moos | ||
classname: MOOS | ||
loader: moos_loader | ||
task: train | ||
split: | ||
|
||
raw_path: ${DATA_PATH.moos.raw} | ||
preprocessed_path: ${DATA_PATH.moos.preprocessed} | ||
h5_path: ${DATA_PATH.moos.preprocessed}/moos.h5 | ||
mv_path: ${DATA_PATH.moos.preprocessed}/moos_mv.h5 | ||
obj_path: ${DATA_PATH.moos.preprocessed}/moos_obj.h5 | ||
lfd_path: ${DATA_PATH.future3d.preprocessed}/lfd_200.h5 | ||
pose_path: ${DATA_PATH.moos.preprocessed}/moos_pose.json | ||
annotation_file: moos_annotation_all.txt | ||
|
||
img_source: image | ||
mask_source: mask | ||
use_crop: False | ||
use_color_transfer: False | ||
batch_size: 64 | ||
num_workers: 8 | ||
|
||
cat_list: [chair, bed, sofa, table] | ||
cat_choice: [chair] | ||
|
||
input_dim: 224 | ||
|
||
multiview: | ||
mv_dirname: neutral_multiviews_12 | ||
mv_num: 12 | ||
mv_dim: [224, 224] | ||
mv_opt: crop | ||
|
||
tour: 2 | ||
# random_model: False | ||
test_only_occlusion: False | ||
test_objects: all | ||
|
||
unique_data_sampler: False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
data: | ||
name: pix3d | ||
|
||
module: gcmic.dataset.pix3d | ||
classname: Pix3D | ||
loader: pix3d_loader | ||
task: train | ||
split: | ||
|
||
raw_path: ${DATA_PATH.pix3d.raw} | ||
preprocessed_path: ${DATA_PATH.pix3d.preprocessed} | ||
h5_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_224.h5 | ||
mv_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_mv.h5 | ||
obj_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_obj.h5 | ||
lfd_path: ${DATA_PATH.pix3d.preprocessed}/lfd_200.h5 | ||
raw_img_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_img_path.txt | ||
pose_path: ${DATA_PATH.pix3d.preprocessed}/pix3d_pose.json | ||
annotation_file: pix3d_annotation_all.txt | ||
|
||
img_source: image | ||
mask_source: mask | ||
use_crop: False | ||
use_color_transfer: False | ||
batch_size: 64 | ||
num_workers: 8 | ||
|
||
cat_list: [chair, bed, desk, sofa, bookcase, table, wardrobe, tool, misc] | ||
cat_choice: [chair] | ||
|
||
input_dim: 224 | ||
|
||
multiview: | ||
mv_dirname: neutral_multiviews_12 | ||
mv_num: 12 | ||
mv_dim: [224, 224] | ||
mv_opt: crop | ||
|
||
tour: 2 | ||
# random_model: False | ||
test_only_occlusion: False | ||
test_objects: all | ||
|
||
unique_data_sampler: False |
Oops, something went wrong.