Skip to content

image caption generator using a CNN-LSTM with soft attention

Notifications You must be signed in to change notification settings

Derrc/image-caption-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Caption Generator

  • Implementation of CNN-LSTM with Soft Attention using Encoder/Decoder architecture
  • Uses ResNet-34 as the CNN Encoder, LSTM with Attention Network as Decoder
  • Trained on Flikr8k dataset
  • Hosted locally using Flask/Python/Pytorch for the backend and React/Typescript for the frontend
  • Small project that I hope to expand on in the future

Small Demo

8mb.video-HPV-WJaR72Ef.mp4

Results

  • All images in demo and in /results folder were never seen by the model (taken from test dataset)
  • It was amazing to see the accuracy that the model with attention could achieve on unseen images, even if it's not a state-of-the-art model compared to now
  • Before implementing the Attention network, I tested the model with just a CNN and LSTM, which became proficient at recognizing dogs but not colors or complex relations between objects

Future Work

  • Train on larger datasets (Flickr30k, MSCOCO)
  • Implement state-of-the-art models
  • Enhance React UI

Resources

About

image caption generator using a CNN-LSTM with soft attention

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published