GitHub - reconfigurable-ml-pipeline/InfAdapter: Source code of "Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems"

InfAdapter: An Adaptation Mechanism for ML Inference Services

Abstract The use of machine learning (ML) inference for various applications is growing drastically. ML inference services engage with users directly, requiring fast and accurate responses. Moreover, these services face dynamic workloads of requests, imposing changes in their computing resources. Failing to right-size computing resources results in either latency service level objectives (SLOs) violations or wasted computing resources. Adapting to dynamic workloads considering all the pillars of accuracy, latency, and resource cost is challenging. In response to these challenges, we propose InfAdapter, that proactively selects a set of ML model variants with their resource allocations to meet latency SLO while maximizing an objective function composed of accuracy and cost. InfAdapter decreases SLO violation and costs up to 65% and 33%, respectively, compared to a popular industry autoscaler (Kubernetes Vertical Pod Autoscaler)

Instructions

Create a Kubernetes cluster
1. Create a K8s cluster using Microk8s: Get started
2. Add another node to the k8s cluster: Create a MicroK8s cluster
Set up Prometheus monitoring inside the cluster Setup Monitoring
Create a namespace called mehran: kubectl create ns mehran
Build resnet models for TensorFlow Serving: instructions at here
Configure NFS server to keep and serve our models: insructions at here
Export a cluster node's IP: export CLUSTER_NODE_IP=NODE_IP
Export NFS server IP: export NFS_SERVER=NFS_SERVER_IP (If not set, the same above CLUSTER_NODE_IP will be used)
Install Python requirements: pip install -e .
Cache Docker images (Run and wait for "OK" message): python auto_tuner/cache_images.py

...

Technology Stack

Python
Kubernetes
TensorFlow Serving
Prometheus

Citation

Please use the following citation if you use this framework:

@inproceedings{salmani2023reconciling,
  title={Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems},
  author={Salmani, Mehran and Ghafouri, Saeid and Sanaee, Alireza and Razavi, Kamran and M{\"u}hlh{\"a}user, Max and Doyle, Joseph and Jamshidi, Pooyan and Sharifi, Mohsen},
  booktitle={Proceedings of the 3rd Workshop on Machine Learning and Systems},
  pages={78--86},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
auto_tuner		auto_tuner
examples		examples
.gitignore		.gitignore
README.md		README.md
adapter-component.png		adapter-component.png
architecture.png		architecture.png
setup-monitoring.md		setup-monitoring.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfAdapter: An Adaptation Mechanism for ML Inference Services

Instructions

Technology Stack

Citation

About

Releases

Packages

Contributors 4

Languages

reconfigurable-ml-pipeline/InfAdapter

Folders and files

Latest commit

History

Repository files navigation

InfAdapter: An Adaptation Mechanism for ML Inference Services

Instructions

Technology Stack

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages