Hello everyone, in this project I will show you how to deploy a machine learning model on Google Kubernetes Engine (also knows as GKE).
Machine Learning (ML) has become an integral part of almost every industry today. After developing a powerful ML model, the next step is to make it accessible and scalable. This is where Docker, a containerization platform, and Kubernetes, a container orchestration platform, come in. By leveraging Docker and Google Kubernetes Engine, we can seamlessly deploy and manage a variety of containerized applications, including our machine learning models.
Before we dive into the deployment process, let's get familiar with some key terms:
1. Docker
Docker is a containerization platform that allows us to package our applications and their dependencies into a standardized unit, known as a container. This ensures consistency across different environments and facilitates seamless deployment.
2. Kubernetes
Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It simplifies the process of running applications in production and efficiently manages containerized workloads.
3. Google Cloud Platform (GCP)
Google Cloud Platform provides a suite of cloud computing services, and GKE is a managed Kubernetes service offered by GCP. GKE abstracts the complexities of managing Kubernetes clusters, allowing us to focus on deploying and running our applications.
Throughout this project, we'll explore the seamless integration of Docker, Kubernetes, and GCP to deploy and manage a machine learning model, unlocking the potential for scalable and efficient deployment.
The main objective of this project is to show how to deploy a machine learning model on GKE, so we won't build an ML model or web application, instead we will use a ready-to-use ML model and web application.
You can access the machine learning model's and web application's code from this link: https://github.com/erkansirin78/flask-iris-classification
This repo contains everything about the model.
This is a simple Iris Flower Classification Model Deployment project as Flask App. [1]
The model is very simple, we enter our 4 input variables and the model returns us which type of plant it is. We also have a simple front-end to make this a web application.
Here are images from the application:
Now that you understand the Model and the Web Application Front End, we can get to the point.
As you know, if we want to deploy our applications in Kubernetes, we must first containerize our applications. In order to do this, we should create a Dockerfile.
Let's explain the Dockerfile.
This line sets the base image. This Docker container will use the official Python image labeled "slim", which contains Python version 3.6. The "slim" label indicates that the base image is a smaller, minimal version.
This line copies the requirements.txt file from your local machine to the Docker container. This file contains the dependencies for the Python application.
This line causes pip to be upgraded. It will upgrade the pip version on the image you are using to the latest version.
This line installs the dependencies listed in requirements.txt. It installs the required Python packages from the file containing the packages required by the application.
This line copies all files (.) of the local project to the /opt/ directory of the Docker container.
This line sets the working directory to /opt/. Subsequent commands will be executed in this directory.
This line specifies the port on which the application in the Docker container will listen. This opens port 8080.
This line specifies the command to run when the Docker container is started. The FLASK_APP environment variable is set to app.py and then the flask run command is run. --host=0.0.0.0.0 ensures that the Flask app is accessible from all IP addresses and --port=8080 specifies that the app listens on port 8080.
Yes, now we have a Dockerfile file. Now that we have created our Dockerfile, we can move to Kubernetes.
Now, we will continue with GCP. When you enter the GCP, you will be greeted by a screen like this. You must create a new project to continue (GCP offers a $300 trial for new members, valid for 90 days)
Then you must activate the Cloud Shell at the top right of the screen.
At the bottom, the console where we will perform our operations is opened. Now let's assign our project id to the variable named "PROJECT_ID".
If you don't know your project id, you can see it by clicking Dashboard from the drop-down menu on the left side.
Then import everything about your application into this console, either from your local computer or from a GitHub repo (recommended).
I will pull the project from the GitHub repo with "git clone" command.
We pulled it off successfully. Let's look at the contents of the repo.
What is Artifact Registry? Artifact Registry is a Google Cloud Platform service where developers can store and manage Docker images, Maven packages, npm packages and other artifacts. We will store our Docker images in this registry.
for more information about Artifact Registry, look at the documentation: https://cloud.google.com/artifact-registry/docs/docker/store-docker-container-images
Now we know what is Artifact Registry, now we need to enable the Artifact Registry API and then create an Artifact Registry.
There are 2 ways to enable Apis. The first way we can enable it with gcloud, the second way we can enable it with the interface.
Let's check the Api if it's enable.
Yes, it's enabled.
Now we can create a Artifact Registry.
As with APIs, we can create an Artifact Registry with the interface or with gcloud. In this project, I will create it in a more complex way, i.e. through the console with gcloud, you can create it with the interface if you want.
When creating an Artifact Registry Google asks us for 3 mandatory information, 1) the name of the Artifact Registry, in this case my-repo, 2) the format, in this case Docker, and 3) the location, in this case us-west1.
To see details about our Artifact Registry
As you can see below, we have created a Artifact Registry.
Now I will go into the project directory I pulled from GitHub and create a Docker Image with Docker commands. Best practice is to create an image directly with "docker build -t" and tag the tag as the repo address (e.g. "us-west1-docker.pkg.dev/${PROJECT_ID}/REPO_ADI/IMAGE NAME:TAG).
Let's see the our all images.
Then we can push our image to the Artifact Registry.
Let's check if the image exists or not
First, let's enable the Api for Kubernetes.
Then we will create a Cluster. We can create an Autopilot cluster using Google Cloud CLI or Google Cloud console. I will continue with Gcloud.
But first, what is Autopilot?
Let's create a cluster. When creating an Autopilot cluster, there are 2 mandatory information we need to enter. 1-) Cluster name, in this example "my-cluster" and 2-) Location, in this example us-west1
Let's see the cluster.
Let's see it in more detail.
Yes, we have created a cluster, now we can connect to the cluster, to connect to cluster, get a credential.
This command configures kubectl to use the cluster you created.
Now that we can run kubectl commands on Gcloud, let's see our nodes.
For now, we don't have any node because we created an autopilot cluster and we haven't deployed any application so we haven't been allocated a node.
Now we have 2 different options to deploy the machine learning model on kubernetes. 1-) We can prepare a YAML file or 2-) we can deploy our model with kubectl commands. In this project, I will deploy with YAML file as it is more common in real life use cases and is a best practice.
As you remember, we have already turned our model into a container and sent it to be stored in the Artifact Registry. If you want to run an application as an imperative configuration on your Kubernets cluster, you will need to keep the model you created in Artifact Registry. But if you want to run an application with a YAML file, you can pull and run your application through an open DockerHub repo as I will explain in the YAML file shortly.
Now, we will first create a Deployment Object.
For more information about Kubernetes Objects: https://kubernetes.io/docs/home/
Let's explain this YAML file.
apiVersion: apps/v1:
This line specifies the Kubernetes API version used. In this particular YAML file, version "apps/v1" is used, which means that application level objects like Deployment are used.
kind: Deployment:
This line specifies the type of object defined in the YAML file. In this case, a Deployment object is defined.
metadata:
This section contains the meta information of the object. For example, the name (name
) of the object is set to "flask-iris-deployment".
spec:
This section specifies the properties of the Deployment object. It contains two main subsections:
replicas: 5:
Specifies the number of pods to deploy. In this example, 5 replicas will be used.
selector and template:
These sections define how to select and configure the pods that Deployment manages. It selects a specific group of pods using a selector based on tags. The pod configuration consists of a template that defines the containers contained in it and other properties.
containers:
This section specifies the containers to run inside the pod. In this example, a container named "flask-iris-model" is defined.
image: erkansirin78/flask-iris-classification:2021-3:
The Docker image is named and labeled. This image contains the Flask-based Iris classification model.
ports:
Specifies which port the container will listen on. In this example, port 8080 is used.
resources:
Specifies the container's resource requests and limits. Demands and limits are set for memory and CPU resources.
Now that we have created our Deployment object, we can run our application, you can use this YAML file by uploading it to Gcloud from a GitHub repo or from your local computer.
Now let's create a deployment with the command "kubectl apply -f flask-iris-deployment.yaml- and use -kubectl get all- to see all our resources on kubernetes and their current status.
As you can see, we have created a deployment and depending on this deployment we have a ReplicaSet and depending on this ReplicaSet we have 5 pods. But the pods are still being created.
After a few minutes...
Let's see if our pods are working with the -Kubectl get pods- command.
As you can see, they are in -running- state.
Let's see our nodes. Now that we have running applications, Autopilot cluster automatically gave us 2 machines.
Our applications are running but we need to create a Service object to access them.
Let's write a Service YAML.
apiVersion: v1:
This line specifies the Kubernetes API version used. In this particular YAML file, version "v1" is used, which represents the older version of Service, a base Kubernetes object.
kind: Service:
This line specifies the type of the object defined in the YAML file. In this case, a Service object is defined.
metadata:
This section contains the meta information of the object. For example, the name of the service is "flask-iris-service".
spec:
This section specifies the properties of the Service object. It contains two main subsections:
type: LoadBalancer:
Specifies the type of the service. In this case, a LoadBalancer type that can be accessed with external IPs is used.
selector:
This section is used to select the pods that this service will route to. In this example, a specific group of pods is selected using a tag-based selector. The tag is app: flask-iris-model.
ports:
This section specifies which ports the service will listen on and which pods it will redirect these ports to. protocol:
TCP:
Specifies the communication protocol used (TCP).
port: 80:
Specifies the port the service will access from the outside world.
targetPort: 8080:
Specifies the port to be forwarded to the specified pods. This means that the application running inside the pods can be accessed through this port. In this example, the port on which the Flask application runs is 8080.
Let's create our Service object with "kubectl apply -f flask-iris-service.yaml" command.
Let's see our services.
We can now access our application through the service we created called "flask-iris-service", for this we need to go to the External-Ip address on the side.
Now let's double the number of pods and see how our nodes change.
After a few minutes all our pods stood up, now let's see our nodes.
As you can see, since we are using Autopilot Cluster, when we scale our pods, the number of nodes automatically increases.
Now we can delete all the resources (service and deployemnt) we created.
Yes, in this article I showed you how to deploy a machine learning model on GKE, thank you for reading, I will be writing a more detailed article about Kubernetes and Docker in the future.
Now if you are all done, you can delete your Google Cloud project.
Thanks for reading.