A Django App for image captioning (PyTorch)
It may take seconds to generate caption, please be patient :)
看图说话 (img-captioning.herokuapp.com)
-
Download project
git clone https://github.com/umnooob/img_captioning.git cd ./img_captioning
-
Set up environment
conda create -n <env_name> pip install -r requirements
-
Copy all static files from
STATICFILES_DIRS
toSTATIC_ROOT
python manage.py collectstatic
-
Start Django project
python manage.py runserver
-
You will see our Image Captioning App in http://127.0.0.1:8000/
You can refer to this blog for detailed instructions. Since our deployment will exceed maximum slug size of 500MB , we will use Docker-based Deployment.
If your employment doesn't exceed 500MB and your model params file exceed 200MB, you can use git lfs and this Heroku Buildpack for simple git-based deployment.
you can find more information in Dockerfile. Since I'm new to docker, the docker image may be redundent and relatively big. PRs are welcome.
-
build image and spin up a container named <container_name> which is up to you.
docker build -t web:latest .
docker run -d --name <container_name> -e "PORT=8765" -e "DEBUG=1" -p 8007:8765 web:latest
-
You can see App in http://localhost:8007
-
Remove the running container
docker stop <container_name> docker rm <container_name>
-
Sign up for Heroku account, and then install the Heroku CLI .
-
create a new app in Heroku
-
set secret key for Django in Heroku
heroku config:set DJANGO_SECRET_KEY=<SOME_SECRET_VALUE> -a <your_app_name>
-
add Heroku url to
ALLOWED_HOSTS
in./pytorch_django/setting.py
ALLOWED_HOSTS = ['<your_app_name>.herokuapp.com']
-
Login, build docker image, Push docker image and release(it may take minutes to push image)
heroku login -i heroku container:login docker build -t registry.heroku.com/<your_app_name>/web . docker push registry.heroku.com/<your_app_name>/web heroku container:release -a <your_app_name> web
-
Finally, you can view your app running in Heroku https://APP_NAME.herokuapp.com
paper:"Show and Tell: A Neural Image Caption Generator" by Vinayls et al. (ICML2015)
Use ResNet-152 to encode a 224*224 RGB picture as a 256-dim embedding, then use a LSTM model to decode. Origin model was trained in MSCOCO dataset.
You can modify models by changing image/image_captioning/models.py
as well as image_captioning.py
. Model parameters can be found in static/*
.