Skip to content

Commit

Permalink
hf downloader doc (#416)
Browse files Browse the repository at this point in the history
* hf downloader doc

Signed-off-by: Andrews Arokiam <[email protected]>

* review comment changes

Signed-off-by: Andrews Arokiam <[email protected]>

* Update hf.md

---------

Signed-off-by: Andrews Arokiam <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
  • Loading branch information
andyi2it and yuzisun authored Nov 24, 2024
1 parent d1ee184 commit 79d731c
Show file tree
Hide file tree
Showing 2 changed files with 122 additions and 0 deletions.
121 changes: 121 additions & 0 deletions docs/modelserving/storage/huggingface/hf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Deploy InferenceService with model from Hugging Face(HF) Hub

You can specify the `storageUri` field on `InferenceService` YAML with the following format to deploy the models from Hugging Face Hub.

```
hf://${REPO}/${MODEL}:${HASH}(optional)
```

e.g. ```hf://facebook/opt-125m```

## Public Hugging Face Models

If no credential is provided, anonymous client will be used to download the model from HF repo.

## Private Hugging Face Models

KServe supports authenticating with `HF_TOKEN` for downloading the model and create a Kubernetes secret to store the HF token.

=== "yaml"
```yaml
apiVersion: v1
kind: Secret
metadata:
name: storage-config
type: Opaque
data:
HF_TOKEN: aGZfVk5Vd1JVAUdCa0l4WmZMTHVrc2VHeW9VVm9udU5pBHUVT==
```
## Deploy InferenceService with Models from HF Hub
### Option 1: Use Service Account with Secret Ref
Create a Kubernetes `ServiceAccount` with the HF token secret name reference and specify the `ServiceAccountName` in the `InferenceService` Spec.

=== "yaml"
```yaml
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: hfserviceacc
secrets:
- name: storage-config
---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: huggingface-llama3
spec:
predictor:
serviceAccountName: hfserviceacc # Option 1 for authenticating with HF_TOKEN
model:
modelFormat:
name: huggingface
args:
- --model_name=llama3
- --model_dir=/mnt/models
storageUri: hf://meta-llama/meta-llama-3-8b-instruct
resources:
limits:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
requests:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
EOF
```

### Option 2: Use Environment Variable with Secret Ref
Create a Kubernete HF token and specify the HF token secret reference using environment variable in the `InferenceService` Spec.

=== "yaml"
```yaml
cat <<EOF | kubectl apply -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: huggingface-llama3
spec:
predictor:
model:
modelFormat:
name: huggingface
args:
- --model_name=llama3
- --model_dir=/mnt/models
storageUri: hf://meta-llama/meta-llama-3-8b-instruct
resources:
limits:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
requests:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
env:
- name: HF_TOKEN # Option 2 for authenticating with HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-secret
key: HF_TOKEN
optional: false
EOF
```

## Check the InferenceService status.

```bash
kubectl get inferenceservices huggingface-llama3
```

!!! success "Expected Output"

```{ .bash .no-copy }
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
huggingface-llama3 http://huggingface-llama3.default.example.com True 100 huggingface-llama3-predictor-default-47q2g 7d23h
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ nav:
- URI: modelserving/storage/uri/uri.md
- CA Certificate: modelserving/certificate/kserve.md
- GCS: modelserving/storage/gcs/gcs.md
- Hugging Face: modelserving/storage/huggingface/hf.md
- Model Explainability:
- Concept: modelserving/explainer/explainer.md
- TrustyAI Explainer: modelserving/explainer/trustyai/README.md
Expand Down

0 comments on commit 79d731c

Please sign in to comment.