This repository includes the implementations for Recurrent Fusion Network for Image Captioning.
- Python 3.6
- PyTorch 0.3.1
- Java
All scripts for feature extraction are included in data/feature_extraction
. Please generate flipped and cropped images to perform data augmentation, download pre-trained models and extract features with the scripts. All extracted featuers should be put in the data
directory.
bash train_recurrent_fusion_model.sh
bash train_recurrent_fusion_model_rl.sh
Evaluate with eval_single.sh
and eval_ensemble.sh
to obtain metric scores for single model and ensemble of multiple models, respectively.
If you find this repo useful, please consider citing:
@InProceedings{Jiang_2018_ECCV,
author = {Jiang, Wenhao and Ma, Lin and Jiang, Yu-Gang and Liu, Wei and Zhang, Tong},
title = {Recurrent Fusion Network for Image Captioning},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}
Our code is based on Ruotian Luo's implementation ans is reorganized by Zhiming Ma.