CISiP-CoCa - Compact Image Captioning Project

(Released August 2021)

Introduction

Compact Image Captioning (CoCa) is an open source image captioning project release by Center of Image and Signal Processing Lab (CISiP Lab), Universiti Malaya. This project is to promote Green Computer Vision to reduce carbon footprint, as well as to make computer vision research (in this case - image captioning research) accessible to universities, research labs and individual practitioners with limited financial resources.

Get Started

Please refer to the documentation.

Features

Captioning models built using PyTorch
- Up-Down LSTM
- Object Relation Transformer
- A Compact Object Relation Transformer (ACORT)
Unstructured weight pruning
- Supermask Pruning (SMP, end-to-end pruning)
- Gradual magnitude pruning
- Lottery ticket
- One-shot magnitude pruning (paper 1, paper 2)
- Single-shot Network Pruning (SNIP)
Self-Critical Sequence Training (SCST)]()
- Random sampling + Greedy search baseline: vanilla SCST
- Beam search sampling + Greedy search baseline: à la Up-Down
- Random sampling + Sample mean baseline: arxiv paper
- Beam search sampling + Sample mean baseline: à la M2 Transformer
- Optimise CIDEr and/or BLEU scores with custom weightage
- Based on ruotianluo/self-critical.pytorch
Multiple captions per image during teacher-forcing training
- Reduce training time: run encoder once, optimize on multiple training captions
Incremental decoding (Transformer with attention cache)
Data caching during training
- Training examples will be cached in memory to reduce disk I/O
- With sufficient memory, the entire training set can be loaded from memory after the first epoch
- Memory usage can be controlled via cache_min_free_ram flag
coco_caption in Python 3
- Based on salaniz/pycocoevalcap
Tokenizer based on sentencepiece
- Word
- Radix encoding
- (untested) Unigram, BPE, Character
Datasets
- MS-COCO
- (contributions welcome) Flickr8k, Flickr30k, InstaPIC-1.1M

Pre-trained Sparse and ACORT Models

The checkpoints are available at this repo.

Soft-attention models implemented in TensorFlow 1.9 are available at this repo.

CIDEr score of pruning methods (on MS-COCO dataset)

Up-Down (UD)

Sparsity	NNZ	Dense Baseline	SMP	Lottery ticket (class-blind)	Lottery ticket (class-uniform)	Lottery ticket (gradual)	Gradual pruning	Hard pruning (class-blind)	Hard pruning (class-distribution)	Hard pruning (class-uniform)	SNIP
0.950	2.7 M	111.3	112.5	-	107.7	109.5	109.7	-	110.0	110.2	38.2
0.975	1.3 M	111.3	110.6	-	103.8	106.6	107.0	-	105.9	105.4	34.7
0.988	0.7 M	111.3	109.0	-	99.3	102.2	103.4	-	101.3	100.5	32.6
0.991	0.5 M	111.3	107.8

Object Relation Transformer (ORT)

Sparsity	NNZ	Dense Baseline	SMP	Lottery ticket (gradual)	Gradual pruning	Hard pruning (class-blind)	Hard pruning (class-distribution)	Hard pruning (class-uniform)	SNIP
0.950	2.8 M	114.7	113.7	115.7	115.3	4.1	112.5	113.0	47.2
0.975	1.4 M	114.7	113.7	112.9	113.2	0.7	106.6	106.9	44.0
0.988	0.7 M	114.7	110.7	109.8	110.0	0.9	96.9	59.8	37.3
0.991	0.5 M	114.7	109.3	107.1	107.0

Acknowledgements

SCST, Up-Down: ruotianluo/self-critical.pytorch
Object Relation Transformer: yahoo/object_relation_transformer
coco_caption in Python 3: salaniz/pycocoevalcap

Citation

If you find this work useful for your research, please cite

@article{tan2021end,
  title={End-to-End Supermask Pruning: Learning to Prune Image Captioning Models},
  author={Tan, Jia Huei and Chan, Chee Seng and Chuah, Joon Huang},
  journal={Pattern Recognition},
  pages={108366},
  year={2021},
  publisher={Elsevier},
  doi={10.1016/j.patcog.2021.108366}
}

Contribution

We welcome the contributions to improve this project, in particular on other datasets - such as Flickr8k, Flickr30k, InstaPIC-1.1M etc. Please file your suggestions/issues by creating new issues or send us a pull request for your new changes/improvement/features/fixes.

License and Copyright

The project is open source under BSD-3 license (see the LICENSE file).

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
coca		coca
docs		docs
resources		resources
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
requirements_base.txt		requirements_base.txt
requirements_dev.txt		requirements_dev.txt
requirements_dev_base.txt		requirements_dev_base.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CISiP-CoCa - Compact Image Captioning Project

(Released August 2021)

Introduction

Get Started

Features

Pre-trained Sparse and ACORT Models

CIDEr score of pruning methods (on MS-COCO dataset)

Up-Down (UD)

Object Relation Transformer (ORT)

Acknowledgements

Citation

Contribution

License and Copyright

About

Releases

Packages

Contributors 3

Languages

License

CISiPLab/cisip-CoCa

Folders and files

Latest commit

History

Repository files navigation

CISiP-CoCa - Compact Image Captioning Project

(Released August 2021)

Introduction

Get Started

Features

Pre-trained Sparse and ACORT Models

CIDEr score of pruning methods (on MS-COCO dataset)

Up-Down (UD)

Object Relation Transformer (ORT)

Acknowledgements

Citation

Contribution

License and Copyright

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages