A neural network to generate captions for an image using CNN and RNN with BEAM Search.
- Requirements
- Training parameters and results
- Generated Captions on Test Images
- Procedure to Train Model
- Procedure to Test on new images
- Configurations (config.py)
- Frequently encountered problems
- References
Recommended System Requirements to train model.
- A good CPU and a GPU with at least 8GB memory
- At least 8GB of RAM
- Active internet connection so that keras can download inceptionv3/vgg16 model weights
Required libraries for Python along with their version numbers used while making & testing of this project
- Python - 3.6.7
- Numpy - 1.16.4
- Tensorflow - 1.13.1
- Keras - 2.2.4
- nltk - 3.2.5
- PIL - 4.3.0
- Matplotlib - 3.0.3
- tqdm - 4.28.1
Flickr8k Dataset: Dataset Request Form
If the link above is unavaliable, you can try these direct download links:
Link Credit: Jason Brownlee
Important: After downloading the dataset, put the required files in train_val_data
folder
batch_size=64
took ~14GB GPU memory in case of InceptionV3 + AlternativeRNN and VGG16 + AlternativeRNNbatch_size=64
took ~8GB GPU memory in case of InceptionV3 + RNN and VGG16 + RNN- If you're low on memory, use google colab or reduce batch size
- In case of BEAM Search,
loss
andval_loss
are same as in case of argmax since the model is same
Model & Config | Argmax | BEAM Search |
---|---|---|
InceptionV3 + AlternativeRNN
|
(Lower the better) (Higher the better) |
BLEU Scores on Validation data (Higher the better) |
InceptionV3 + RNN
|
(Lower the better) (Higher the better) |
BLEU Scores on Validation data (Higher the better) |
VGG16 + AlternativeRNN
|
(Lower the better) (Higher the better) |
BLEU Scores on Validation data (Higher the better) |
VGG16 + RNN
|
(Lower the better) (Higher the better) |
BLEU Scores on Validation data (Higher the better) |
Model used - VGG16 + AlternativeRNN ,
Image | Caption |
---|---|
|
|
|
Photo Credits: Brad Bradmore and Sincerely Media on Unsplash.
- Clone the repository to preserve directory structure.
git clone https://github.com/saharshy29/altML.git
- Put the required dataset files in
train_val_data
folder (files mentioned in readme there). - Review
config.py
for paths and other configurations (explained below). - Run
train_val.py
.
- Clone the repository to preserve directory structure.
git clone https://github.com/saharshy29/altML.git
- Train the model to generate required files in
model_data
folder (steps given above) OR use the previously trained model weights for VGG16+AlternativeRNN with a Beam Search, k=3. - Put the test images in
test_data
folder. - Review
config.py
for paths and other configurations (explained below). - Run
test.py
.
config
images_path
:- Folder path containing flickr dataset imagestrain_data_path
:- .txt file path containing images ids for trainingval_data_path
:- .txt file path containing imgage ids for validationcaptions_path
:- .txt file path containing captionstokenizer_path
:- path for saving tokenizermodel_data_path
:- path for saving files related to modelmodel_load_path
:- path for loading trained modelnum_of_epochs
:- Number of epochsmax_length
:- Maximum length of captions. This is set manually after training of model and required for test.pybatch_size
:- Batch size for training (larger will consume more GPU & CPU memory)beam_search_k
:- BEAM search parameter which tells the algorithm how many words to consider at a time.test_data_path
:- Folder path containing images for testing/inferencemodel_type
:- CNN Model type to use -> inceptionv3 or vgg16random_seed
:- Random seed for reproducibility of results
rnnConfig
embedding_size
:- Embedding size used in Decoder(RNN) ModelLSTM_units
:- Number of LSTM units in Decoder(RNN) Modeldense_units
:- Number of Dense units in Decoder(RNN) Modeldropout
:- Dropout probability used in Dropout layer in Decoder(RNN) Model
- Out of memory issue:
- Try reducing
batch_size
- Try reducing
- Results differ everytime I run script:
- Due to stochastic nature of these algoritms, results may differ slightly everytime. Even though I did set random seed to make results reproducible, results may differ slightly.
- Results aren't very great using beam search compared to argmax:
- Try higher
k
in BEAM search usingbeam_search_k
parameter in config. Note that higherk
will improve results but it'll also increase inference time significantly.
- Try higher
- Show and Tell: A Neural Image Caption Generator - Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
- Where to put the Image in an Image Caption Generator - Marc Tanti, Albert Gatt, Kenneth P. Camilleri
- How to Develop a Deep Learning Photo Caption Generator from Scratch
- Machine learning code based on Image-Caption-Generator by @dabasajay