MORE

Official code implementation of ECCV 2022 paper: "MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes".

This paper aims at progressively encode first- and multi-order spatial relations within graphs in a recently proposed novel task -- 3D dense captioning

Data and Setup steps

We suggest readers refer to data and setup preparation steps of Scan2Cap.

Training and Evaluation

We provide training scripts on multi-GPUs, which can facilitate users to train our model faster. To train MORE with different painted point features (e.g., xyz, xyz+normal, xyz+normal+rgb, xyz+normal+multiview), please run

CUDA_VISIBLE_DEVICES=$gpu_ids python -m torch.distributed.launch --nproc_per_node $num_of_gpus --master_port $port_id scripts/train_ddp.py --use_color --use_normal --num_graph_steps 1 --graph_module SLGC --graph_mode spatial_layout_conv --stamp MORE_normal_rgb --decoder_module OTAG --lr 1e-4

The above command gives an example of using xyz+normal+rgb as input point features. For using other settings of point features, please flexibly add or remove args including --use_normal, --use_color, --use_rgb. Please change the --num_graph_steps to 2 when using multiview features as:

CUDA_VISIBLE_DEVICES=$gpu_ids python -m torch.distributed.launch --nproc_per_node $num_of_gpus --master_port $port_id scripts/train_ddp.py --use_multiview --use_normal --num_graph_steps 2 --graph_module SLGC --graph_mode spatial_layout_conv --stamp MORE_normal_multiview --decoder_module OTAG --lr 1e-4

To evaluate the caption performances, please run

 python scripts/eval.py --folder $ckpt_folder_name --use_color --use_normal --num_graph_steps 1 --num_locals 10 --eval_caption --min_iou 0.5 --graph_module SLGC --graph_mode spatial_layout_conv --decoder_module OTAG

Note that arguments must match ones for training.

If you find any problems, feel free to make an issue.

Performances

model	[email protected]	[email protected]	[email protected]	[email protected]
MORE_rgb	38.98	23.01	21.65	44.33
MORE_mul	40.94	22.93	21.66	44.42

Since the caption metrics are not stable, we recommend you to save the checkpoints at every epoch and load the checkpoint with higher validation performance during testing for better performance.

Citation

If you found our project helpful, please kindly cite our paper via:

@inproceedings{jiao2022more,
  title={More: Multi-order relation mining for dense captioning in 3d scenes},
  author={Jiao, Yang and Chen, Shaoxiang and Jie, Zequn and Chen, Jingjing and Ma, Lin and Jiang, Yu-Gang},
  booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXV},
  pages={528--545},
  year={2022},
  organization={Springer}
}

Acknowledgement

We sincerely thank the authors of Scan2Cap for open sourcing their data and code. Part of the code in our project are from Scan2Cap.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Images		Images
data/scannet		data/scannet
lib		lib
models		models
pretrained		pretrained
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MORE

Data and Setup steps

Training and Evaluation

Performances

Citation

Acknowledgement

About

Releases

Packages

Languages

License

SxJyJay/MORE

Folders and files

Latest commit

History

Repository files navigation

MORE

Data and Setup steps

Training and Evaluation

Performances

Citation

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages