This codelab simulates scenarios where a startup CEO is trying to build a cloud-native intelligent app based on an open-source large language model. In particular, they want to quickly test and compare different cloud providers to find the best price performance.
In this codelab, you will follow a step-by-step guide to experiment with state-of-the-art hardware like Nvidia A100 GPU chips, large language model like Meta Llama 3.1, and software like vLLM. You'll leverage cloud-native technologies like Terraform, Docker, and Linux Bash on major cloud providers such as Azure and AWS.
- Bash (Unix shell) is required to execute commands in this codelab.
- Azure Cloud Shell is recommended. (Note: It is also highly recommended to mount a storage account in case of accidental browser closure. Follow instructions here) Alternatively, macOS and Ubuntu are supported.
In your lab environment, clone the repository and enter the directory:
git clone https://github.com/Azure-Samples/compete-labs
cd compete-labs
Install dependencies, authenticate, and initialize environments by running the commands below:
source scripts/init.sh
export CLOUD=azure
export REGION=eastus2
export CLOUD=aws
export REGION=us-west-2
Provision infrastructure resources like GPU Virtual Machine:
source scripts/resources.sh provision $CLOUD $REGION
Deploy the LLM-backed inferencing server using Docker:
source scripts/server.sh deploy $CLOUD
Download the Llama 3 8B model from Hugging Face, load it into the GPUs, and start the HTTP server:
source scripts/server.sh start $CLOUD
Send some prompt requests to the HTTP server to test chat completion endpoint:
source scripts/server.sh test $CLOUD
Cleanup infrastructure resources like GPU Virtual Machine:
source scripts/resources.sh cleanup $CLOUD $REGION
Collect and upload test results to Azure Data Explorer. Please always publish results even if you run into issue before reaching this step, this helps us to know which step failed with what error
source scripts/publish.sh $CLOUD
Check out aggregated and visualized test results on the dashboard
If you run into issue, please read this troubleshooting doc