Serve a production-ready and scalable Keras-based deep learning model image classification using FastAPI, Redis and Docker Swarm. Based off this series of blog posts.
Make sure you have a modern version of docker
(>1.13.0)and docker-compose
installed.
Simply run docker-compose up
to spin up all the services on your local machine.
- Test the
/predict
endpoint by passing in the includeddoge.jpg
as parameterimg_file
:
curl -X POST -F [email protected] http://localhost/predict
You should see the predictions returned as a JSON response.
Deploying this on Docker Swarm allows us to scale the model server to multiple hosts.
This assumes that you have a Swarm instance set up (e.g. on the cloud). Otherwise, to test this in a local environment, put your Docker engine in swarm mode with docker swarm init
.
- Deploy the stack on the swarm:
docker stack deploy -c docker-compose.yml mldeploy
- Check that it's running with
docker stack services mldeploy
. Note that the model server is unreplicated at this time. You may scale up the model worker by:
docker service scale mldeploy_modelserver=X
Where X
is the number of workers you want.
We can use locust and the included locustfile.py
to load test our service. Run the following command to spin up 20
concurrent users immediately:
locust --host=http://localhost --no-web -c 20 -r 20
The --no-web
flag runs locust in CLI mode. You may also want to use locust's web interface with all its pretty graphs, if so, just run local --host=http://localhost
.