Image Caption Generator

Implementation of CNN-LSTM with Soft Attention using Encoder/Decoder architecture
Uses ResNet-34 as the CNN Encoder, LSTM with Attention Network as Decoder
Trained on Flikr8k dataset
Hosted locally using Flask/Python/Pytorch for the backend and React/Typescript for the frontend
Small project that I hope to expand on in the future

Small Demo

8mb.video-HPV-WJaR72Ef.mp4

All images in demo and in /results folder were never seen by the model (taken from test dataset)
It was amazing to see the accuracy that the model with attention could achieve on unseen images, even if it's not a state-of-the-art model compared to now
Before implementing the Attention network, I tested the model with just a CNN and LSTM, which became proficient at recognizing dogs but not colors or complex relations between objects

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
backend		backend
frontend/app		frontend/app
results		results
README.md		README.md
image_captioning.ipynb		image_captioning.ipynb