SageMaker Model Deployment with Hugging Face – Error Investigation

This document details the process of deploying a Hugging Face pre-trained model on AWS SageMaker Studio using JupyterLab. It includes key setup steps, configuration requirements, and an analysis of an error encountered during deployment.

Key Components

Transformers Library: Hugging Face's library containing pre-trained models for a variety of machine learning tasks.
HuggingFaceModel Class (SageMaker SDK): A utility class in the SageMaker SDK that simplifies the process of loading and deploying Hugging Face models on SageMaker.

Requirements

AWS IAM Role: Requires permissions for SageMaker, S3, and Hugging Face model access.
S3 Bucket: Storage for model files.
AWS SageMaker: Configured environment for deploying the model.

Steps to Reproduce

1. Select the Pre-trained Model

The selected model and task for this deployment are:

Model ID: llava-hf/llama3-llava-next-8b-hf
Task: Image-to-Text Translation

2. Download Model Files

Manually downloaded the following model and configuration files:

model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
chat_template.json
generation_config.json
preprocessor_config.json
tokenizer.json
config.json
model.safetensors.index.json
special_tokens_map.json
tokenizer_config.json

3. Upload to S3

The model files were compressed and uploaded manually to the following S3 bucket:

S3 Bucket: sagemaker-us-east-2-476671003699
Folder: llama3/

4. SageMaker Studio Setup

SageMaker Studio was set up with the following instance type:

Instance Type: ml.m5.2xlarge

Environment Setup

Global Environment: Encountered an error while installing the sagemaker package using !pip install sagemaker -U. Refer to the image pip_install_sgmkr for details.
Virtual Environment:
- Created a virtual environment using !python3 -m venv.
- Installed required packages: !pip install sagemaker -U, !pip install transformers.

5. Model Deployment Code

import sagemaker
from sagemaker import get_execution_role
from sagemaker.huggingface.model import HuggingFaceModel
import boto3

try:
    role = sagemaker.get_execution_role()
except:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName="sagemaker_execution_role")['Role']['Arn']

# Define S3 path for the model
model_s3_path = "s3://sagemaker-us-east-2-476671003699/llama3/model.tar.gz"

huggingface_model = HuggingFaceModel(
    model_data=model_s3_path,
    role=role,
    transformers_version="4.12",
    pytorch_version="1.9",
    py_version="py38",
)

# Deploy the model
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.2xlarge"
)

# Sample data for inference
data = {
   "inputs": "who are you"
}

# Invoke the endpoint for prediction
predictor.predict(data)

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: 
Received client error (400) from primary with message:
{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llava_next\u0027"
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SageMaker Model Deployment with Hugging Face – Error Investigation

Table of Contents

Key Components

Requirements

Steps to Reproduce

1. Select the Pre-trained Model

2. Download Model Files

3. Upload to S3

4. SageMaker Studio Setup

Environment Setup

5. Model Deployment Code

About

Releases

Packages

kmlFaouzi/ai

Folders and files

Latest commit

History

Repository files navigation

SageMaker Model Deployment with Hugging Face – Error Investigation

Table of Contents

Key Components

Requirements

Steps to Reproduce

1. Select the Pre-trained Model

2. Download Model Files

3. Upload to S3

4. SageMaker Studio Setup

Environment Setup

5. Model Deployment Code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages