feat(containers): experimentation with hugging face models #84

redanrd · 2024-04-22T08:17:27Z

Summary

In this example, we experiment deploying some lightweight Hugging Face models (phi, llama, and mistral) using a public HF model repository. The deployment of these models is done using Terraform. They are bench-marked using a python script.

Checklist

I have reviewed this myself.
I have attached a README to my example. You can use this template as reference.
I have updated the project README to link my example.

Shillaker · 2024-05-10T12:17:46Z

containers/hugging-face-inference/terraform/hf-models.json

+            "file": "llama-2-7b.Q2_K.gguf",
+            "source" : "https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q2_K.gguf",
+            "size_gb": "2.83",
+            "ctn_endpoint": "paste container endpoint here"


issue(improvement): we can template this file with Terraform as part of the deployment. You can see an example of templating container URLs in this example where we template a shell script.

Shillaker · 2024-05-10T12:18:28Z

containers/hugging-face-inference/terraform/benchmark-models.py

+                for _ in range(num_samples):
+                    try:
+                        print(
+                            "Calling model {model} on endpoint {endpoint} with message {message}".format(


issue(syntax): use f-strings by default.

Shillaker · 2024-05-10T12:21:43Z

containers/hugging-face-inference/README.md

@@ -0,0 +1,33 @@
+# Hugging Face Models
+


issue(docs): this README has a lot of good technical detail, but no high-level explanation of what the example does. We need to explain what the example does, what SCW resources it uses, and link to Hugging Face and the models used (and any interesting Python libraries we use too).

Shillaker · 2024-05-10T12:24:03Z

containers/hugging-face-inference/README.md

@@ -0,0 +1,33 @@
+# Hugging Face Models
+
+### Deploy models in Serverless Containers


issue(structure): our examples should all use the standard README format included in the top-level of the repo.

Shillaker · 2024-05-10T12:24:35Z

containers/hugging-face-inference/README.md

+- Export these variables:
+
+```bash
+export SCW_ACCESS_KEY="access-key" SCW_SECRET_KEY="secret-key" SCW_PROJECT_ID="project-id" REGION="fr-par"


suggestion(tfvars): make these Terraform variables and give region a default of fr-par.

Shillaker · 2024-05-10T12:25:24Z

containers/hugging-face-inference/README.md

+```bash
+cd terraform && bash terraform.sh -a
+```
+


issue(docs): can you add a section on how to call one of the inference endpoints?

If you add the endpoints as a Terraform output, you can write a command that you can copy-paste using terraform output. See the other examples on how to do this.

There should be a command to call the "hello" endpoint to check they are working, then ideally a command for how to get an inference decision.

Shillaker · 2024-05-10T12:28:41Z

containers/hugging-face-inference/terraform/terraform.sh

@@ -0,0 +1,56 @@
+#!/bin/bash
+


issue(scripting): These kinds of commands should be managed in a Makefile.

Shillaker · 2024-05-10T12:29:09Z

containers/hugging-face-inference/terraform/terraform.sh

+eval "$(jq -r '.[]|.[]|"hf_models[\(.file)]=\(.source)"' hf-models.json)"
+
+# Login to docker Scaleway's registry
+docker login "rg.$REGION.scw.cloud" -u nologin --password-stdin <<< "$SCW_SECRET_KEY"


suggestion(simplify): We don't need to log into the repo every time, this can be a one-off step at the start (and listed in the README).

Shillaker · 2024-05-10T12:30:34Z

containers/hugging-face-inference/terraform/terraform.sh

+# Initialize, plan, and deploy each model in a Terraform workspace
+apply() {
+       terraform init
+       for model_file_name in "${!hf_models[@]}";


question(terraform): Instead of using bash for-loops here, can we use for_each in the Terraform files to iterate over the list of models in a variable?

cyclimse · 2024-05-24T08:06:56Z

containers/hugging-face-inference/Dockerfile

+
+RUN pip install -r requirements.txt
+
+RUN pip install llama-cpp-python==0.2.62 \


nit: you can also include it in the requirements.txt directly:

flask=3.0.5 ... --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu llama-cpp-python==0.2.62

cyclimse · 2024-05-24T08:09:49Z

containers/hugging-face-inference/terraform/providers.tf

+  region     = var.region
+  access_key = var.access_key
+  secret_key = var.secret_key
+  project_id = var.project_id


nit: IMO this is unnecessary, the default behavior of the provider is to use your config file or the environment to get its configuration, so I would leave it blank

cyclimse · 2024-05-24T08:12:24Z

containers/hugging-face-inference/main.py

+
+app = FastAPI()
+
+print("loading model starts", flush=True)


nit: An alternative to adding flush=True to every print statement is to pass "PYTHONUNBUFFERED=1` as an env var (or even set it in the Dockerfile) to avoid buffering stdout

redanrd added 11 commits April 22, 2024 10:16

feat(containers): inference with hugging face models

67e0065

feat: deploy multiple models using terraform workpaces

273615d

feat: deploy models using json file info

ff81008

fix: model sources

1c3fce3

feat: docker login + terraform select with create flag

7c8a742

feat: benchmark script

975f389

feat: rename script + add flags

162a433

refactor: fastapi app

3640c3d

feat: remove cron schedule as used for observability purposes only

1a01f21

docs: readme

6990297

refactor: style and rewording

72d4289

redanrd changed the title ~~feat(containers): inference with hugging face models~~ feat(containers): experimentation with hugging face models May 7, 2024

redanrd marked this pull request as ready for review May 7, 2024 10:53

redanrd requested review from Shillaker and cyclimse May 7, 2024 14:01

Shillaker requested changes May 10, 2024

View reviewed changes

cyclimse approved these changes May 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(containers): experimentation with hugging face models #84

feat(containers): experimentation with hugging face models #84

redanrd commented Apr 22, 2024 •

edited

Loading

Shillaker May 10, 2024

Shillaker May 10, 2024

Shillaker May 10, 2024

Shillaker May 10, 2024

Shillaker May 10, 2024

Shillaker May 10, 2024

Shillaker May 10, 2024

Shillaker May 10, 2024

Shillaker May 10, 2024

cyclimse May 24, 2024

cyclimse May 24, 2024

cyclimse May 24, 2024

		@@ -0,0 +1,33 @@
		# Hugging Face Models

		### Deploy models in Serverless Containers


		RUN pip install -r requirements.txt

		RUN pip install llama-cpp-python==0.2.62 \

feat(containers): experimentation with hugging face models #84

Are you sure you want to change the base?

feat(containers): experimentation with hugging face models #84

Conversation

redanrd commented Apr 22, 2024 • edited Loading

Summary

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

redanrd commented Apr 22, 2024 •

edited

Loading