This repository contains all information about the project progress for team 5 of the REMLA course, including PR's per person in ACTIVITY.md
and progress for each assignment in review.md
. The latter file lists all the things that have been implemented according to the rubric per assignment.
First the steps to run the application and view Prometheus/Grafana dashboards are listed, then the project contents are explained in a bit more detail. At the end there is an overview of all files in the repository as well.
To run using Docker, login to github package repository and compose containers, see the commands below. The compose.yml
file includes a port mapping for both the model-service
and app
. Also an environment variable is present for the URL where the model can be queried. A volume mapping could be easily implemented (see here) for the model and/or training data, however currently this is not done since the data is downloaded from Google Drive.
docker login https://ghcr.io
docker compose up
To run a Kubernetes cluster in a configurable number of VMs, simply run the following command:
vagrant up
The number of worker nodes can be specified at the top of the Vagrantfile (line 5). If you now want to interact with the cluster, run the following command on the host (from the main directory of this repository):
export KUBECONFIG=./playbooks/k3s.yaml
or this command when in the cluster (i.e. when ssh-ed into a node):
export KUBECONFIG=/vagrant/playbooks/k3s.yaml
Now you can use the kubectl
command as you are used to. The following commands can be used to investigate the cluster:
kubectl get nodes
to list all running nodes, this should be the control-plane and the number of worker nodes that you configuredkubectl get pods
to list all running pods in the default namespace, to see all pods run include the--all-namespaces
flagkubectl get services
to list all services, this will be used in the section below
These are the most basic ones, a lot more information can be extracted from the cluster. For an overview of all commands, please visit this page.
The application is monitored through many metrics. To see these metrics we need to open Prometheus. For this first export the KUBECONFIG (if not done already) and the list the services:
export KUBECONFIG=./playbooks/k3s.yaml
kubectl get services
You should look for the Prometheus port, which you can use to enter Prometheus using the IP of the controller node. This IP is set to 192.168.56.10.
Inside Prometheus you can query app specific metrics such as requests_to_model_total
. Selecting this metric in the Prometheus page allows you to see its value progress over time, as can be seen in the figure below (note there are two models, this will be explained in the Istio section).
Prometheus is also setup with an Alermanager, such that an alert is raised when more than 15 request are received for the last two mintues. The message that pops up looks as follows:
The reported metrics can be visualized through Grafana. To open the Grafana dashboard you first need to export the KUBECONFIG (if not done already) and list the services:
export KUBECONFIG=./playbooks/k3s.yaml
kubectl get services
You should look for the grafana port, which you can use to enter Grafana using the IP of the controller node. This IP is set to 192.168.56.10.
You will then be redirected to a login page, where the username and password are both admin
. Inside Grafana you can create new dashboards or look at our Custom Metrics Dashboard
, which is loaded from a json file. A part of the Dashboard looks as follows:
Notice that the metrics appear twice for every category, this is because of the two versions of the app being up.
With the use of Istio we run two versions of the app by default. The difference between these versions is the color of the buttons when the user is asked about his/her opinion on whether a URL is phishing or not. Metrics for both these app versions are collected and can be seen when visiting the Custom Metrics Dashboard
. Using headers, we were able stabilize the subset of requests that are redirected to the new service. The usual split between the two versions is 90/10, but using headers we are able to consistently send test users our test version of the app.
Additionally a shadow launch for the model-service
is implemented, where two different versions of the trained model are used. All requests sent to the original model are also sent to the newly trained model. The Custom Metrics Dashboard for Shadow Launch
can be used to compare the model performances and assess whether the new model can be used. A part of the dashboard for this looks as follows:
In the figure you can clearly see the two different versions of the model (v1 and v2).
Moreover, we have also implemented user-based rate limiting. As we can see in the images below, if we send more than 10 requests being authorized as user Bob
, we get TOO_MANY_REQUESTS
for all consecutive requests in that minute.
However, this does not prevent user Nick
to send up to 10 requests per minute. Although after that he is also limited. We also have a global rate limit which is currently set to 100 requests per minute in total.
The project concerns the training and deployment of a Phishing URL detector as a web application. The project consists of multiple repositories, each with their own focus, working together to create the full application. The architecture looks as follows:
This repository is the entrypoint to the project and contains all information needed. Below the contents of the other repositories are briefly mentioned. Additionally a report has been written that covers all details in a lot more depth, this can be found in the /assets
folder of this repository.
The model-training
repository contains the following:
- the pipeline to train a model for Phishing URL detection
- The trained model is stored in an accessible location (here)
- a preprocessing step that is imported through
lib-ml
- a GitHub workflow checks the code quality with two linters (Pylint and Bandit), which fail the build if the scores are not perfect
- a testing pipeline that checks if all parts work as intended using pytest
The model-service
repository contains the following:
- a queryable environment that fetches the model trained in
model-training
- also uses
lib-ml
for preprocessing of the input - an embedded ML model using Flask
- A GitHub workflow that automatically versions and releases the image in the GitHub container registry
The lib-ml
repository contains the following:
- the preprocessing logic needed before training or using models
- A GitHub workflow that automatically versions and releases the library in a package registry
The app
repository contains the following:
- an
app-frontend
that contains the frontend of the application - an
app-service
that queries themodel-service
- A GitHub workflow that automatically versions and releases the image in a package registry
The lib-version
repository contains the following:
- a VersionUtil class that can be asked for its version
- a GitHub workflow that automatically versions and releases the library in a package registry
The entire file structure of the repository including explanations per file can be found below.
├── assets -> folder containing supporting asset files
│ ├── REMLA24-team5-presentation.pdf -> presentation pdf
│ ├── REMLA24-team5-report.pdf -> report pdf
│ ├── alertmanager.jpeg -> image showing AlertManager
│ ├── app-versions-2.jpeg -> image showing both app versions in Grafana
│ ├── architecture.png -> image showing general project architecture
│ ├── grafana-port.png -> image showing Grafana port to use
│ ├── model-versions.jpeg -> image showing both model versions in Grafana
│ ├── prometheus-port.png -> image showing Prometheus port to use
│ └── prometheus-requests.jpeg -> image showing requests on Prometheus dashboard
├── kubernetes -> folder containing all Kubernetes deployment files
│ ├── app.yml -> deployment file for the app
│ ├── environment.yml -> ConfigMap to store the model-service URL
│ ├── grafana-value.yml -> contains default settings for the Grafana Dashboard
│ ├── ingress.yml -> deployment file for an ingress
│ ├── istio-shadow-launch.yml -> defines Istio shadow launch for two model versions
│ ├── istio.yml -> defines Istio objects for different app versions
│ ├── model-service-mirrors.yml -> deployment when having two model versions
│ ├── model-service.yml -> deployment file for single model-service
│ └── prometheus-value.yml -> contains default settings for Prometheus
├── monitoring -> folder containing monitoring specific yaml files
│ ├── monitoring-mirrors.yml -> used to start a ServiceMonitor when having mirrored models
│ ├── monitoring.yml -> used to start a ServiceMonitor
│ └── prometheus_rule.yml -> defines a PrometheusRule to send alerts
├── playbooks -> folder containing playbooks for provisioning
│ ├── grafana-config.yaml -> defines the custom dashboards
│ └── setup_k8s_cluster.yml -> playbook that sets up the whole Kubernetes cluster
├── volume -> volume folder containing necessary files
│ ├── model-v1.joblib -> version one of the model
│ ├── model-v2.joblib -> version two of the model
│ ├── test.txt -> file for model testing
│ ├── train.txt -> file for model trainig
│ └── val.txt -> file for model validation
├── .gitignore -> contains files which Git should ignore
├── ACTIVITY.md -> lists team member's activity per assignment
├── README.md -> general README of the repository
├── Vagrantfile -> used to define and start VMs
├── compose.yml -> Docker compose file to start the application
└── review.md -> file containig progress for each rubric