ConditionalLearnToPayAttention

A tensorflow version implementation of ConditionalLearnToPayAttention.
Designing a conditional attention mechanism to solve sequential visual task such as multiple objects recognition and image caption.

SVHN dataset

SVHN is obtained from house numbers in Google Street View images. The dataset can be download here(format1).

Training data samples

COCO dataset

Mscoco is a dataset built by Microsoft, which includes detection, segmentation, keypoints and other tasks. The dataset can be download here

Requirements

python 3.6
tensorflow 1.4.0
numpy 1.15.0
matplotlib 2.0.0
skimage 0.15.0

MultipleObjectsRecognition

Training details

We generate images with bounding boxes, and resize the images to 64×64. We then use the similar data augmentation which crops a 54×54 pixel image from a random location within the 64×64 pixel image in Goodfellow et al. (2013).
In order to verify the universality of the model, we directly resize the orginal images in SVHN dataset without bounding boxes, and the results outperform than the method in Goodfellow et al. (2013)
Also we use multiple scale attention features to improve performance, and for different attention scales, the method of training model is the same.
Run python convert_to_tfrecords.py, you can get three tfrecords files(train,val,test) with bounding box.
Run python main.py

WeaklySvhnRecognition

We have only reprocessed the data, and the structure and training of our model have not changed, so we only need to run python convert_to_tfrecords.py to generate new weakly labeled data.
Run python convert_to_tfrecords.py, you can get three data tfrecords files(train,val,test) without bounding box.

Image Caption

The image caption code mainly refers to this author who has implemented the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Usage

Download

Download the COCO train2014 and val2014 data. Put the COCO train2014 images in the folder train/images, and put the file captions_train2014.json in the folder train. Similarly, put the COCO val2014 images in the folder val/images, and put the file captions_val2014.json in the folder val. Furthermore, download the pretrained VGG16 net here if you want to use it to initialize the CNN part.

Training

Run python main.py --phase=train --load=False --load_cnn=True --cnn_model_file='./vgg16_no_fc.npy' --train_cnn=True --beam_size=3

Testing

Run python main.py --phase=eval --load=True --model_file='./models/xxxx.npy' --load_cnn=False --train_cnn=False --beam_size=3.

Result

The crop svhn recognition accuracy of this soft attention model is reached 97.15% than baseline CNN model 96.04% here.
The weakly svhn recognition accuracy of the soft attention model is reached 80.45% than baseline CNN model 70.58%.
All qualitative and quantitative results are all exported to the svhn.log, you can print some other results to the logs if you are interested. You also can view results in tensorboard.

Run tensorboard --logdir=logs.

The image caption model was trained on the COCO train2014 data. It achieves the following BLEU scores on the COCO val2014 data (with beam size=3):
BLEU-1 = 70.9
BLEU-2 = 54.1
BLEU-3 = 40.5
BLEU-4 = 30.3
METEOR = 23.9
CIDEr = 89.5
You also can view results in tensorboard.

Run tensorboard --logdir=summary.

Visualization attention map

Attention maps from conditional attention model trained on SVHN dataset with/without bounding box, or valisualization of image caption can be seen in our paper.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
ImageCaption		ImageCaption
MultipleObjectsRecognition		MultipleObjectsRecognition
WeaklySvhnRecognition		WeaklySvhnRecognition
samples		samples
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConditionalLearnToPayAttention

SVHN dataset

Training data samples

COCO dataset

Requirements

MultipleObjectsRecognition

Training details

WeaklySvhnRecognition

Image Caption

Usage

Download

Training

Testing

Result

Visualization attention map

About

Releases

Packages

Languages

caoquanjie/ConditionalLearnToPayAttention

Folders and files

Latest commit

History

Repository files navigation

ConditionalLearnToPayAttention

SVHN dataset

Training data samples

COCO dataset

Requirements

MultipleObjectsRecognition

Training details

WeaklySvhnRecognition

Image Caption

Usage

Download

Training

Testing

Result

Visualization attention map

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages