-
Notifications
You must be signed in to change notification settings - Fork 79
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Issue #18: Added Llama_2_Marketplace_Listing_Sample notebook an enhan…
…ced version of Marketplace notebook
- Loading branch information
amesar
committed
Dec 11, 2023
1 parent
4cdf6a0
commit 7acf66a
Showing
3 changed files
with
263 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
249 changes: 249 additions & 0 deletions
249
databricks/notebooks/llama2/Llama_2_Marketplace_Listing_Sample.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,249 @@ | ||
# Databricks notebook source | ||
# MAGIC %md | ||
# MAGIC # Overview of llama_2 models in Databricks Marketplace Listing | ||
# MAGIC | ||
# MAGIC Enhanced version (parameterized with widgets) of the Datarbicks Marketplace | ||
# MAGIC [llama_2_marketplace_listing_example](https://marketplace.databricks.com/details/46527194-66f5-4ea1-9619-d7ec6be6a2fa/Databricks_Llama-2-Models) notebook. | ||
# MAGIC | ||
# MAGIC The llama_2 models offered in Databricks Marketplace are text generation models released by Meta AI. They are [MLflow](https://mlflow.org/docs/latest/index.html) models that packages | ||
# MAGIC [Hugging Face’s implementation for llama_2 models](https://huggingface.co/meta-llama) | ||
# MAGIC using the [transformers](https://mlflow.org/docs/latest/models.html#transformers-transformers-experimental) | ||
# MAGIC flavor in MLflow. | ||
# MAGIC | ||
# MAGIC **Input:** string containing the text of instructions | ||
# MAGIC | ||
# MAGIC **Output:** string containing the generated response text | ||
# MAGIC | ||
# MAGIC For example notebooks of using the llama_2 model in various use cases on Databricks, please refer to [the Databricks ML example repository](https://github.com/databricks/databricks-ml-examples/tree/master/llm-models/llamav2). | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC # Listed Marketplace Models | ||
# MAGIC - llama_2_7b_chat_hf: | ||
# MAGIC - It packages [Hugging Face’s implementation for the llama_2_7b_chat_hf model](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf). | ||
# MAGIC - It has 7 billion parameters. | ||
# MAGIC - While it offers the fastest processing speed, it may have lower quality compared to other models in the model family. | ||
# MAGIC - llama_2_13b_chat_hf: | ||
# MAGIC - It packages [Hugging Face’s implementation for the llama_2_13b_chat_hf model](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf). | ||
# MAGIC - It has 13 billion parameters. | ||
# MAGIC - It offers a middle ground on speed and quality compared to other models in the model family. | ||
# MAGIC - llama_2_70b_chat_hf: | ||
# MAGIC - It packages [Hugging Face’s implementation for the llama_2_70b_chat_hf model](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf). | ||
# MAGIC - It has 70 billion parameters. | ||
# MAGIC - This model excels in quality among the models in the model family, but it comes at the cost of being the slowest. | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC # Suggested environment | ||
# MAGIC Creating and querying serving endpoint don't require specific runtime versions and GPU instance types, but | ||
# MAGIC for batch inference we suggest the following: | ||
# MAGIC | ||
# MAGIC - Databricks Runtime for Machine Learning version 14.2 or greater | ||
# MAGIC - Recommended instance types: | ||
# MAGIC | Model Name | Suggested instance type (AWS) | Suggested instance type (AZURE) | Suggested instance type (GCP) | | ||
# MAGIC | --------------- | ----------------------------- | ------------------------------- | ----------------------------- | | ||
# MAGIC | `llama_2_7b_chat_hf` | `g5.8xlarge` | `Standard_NV36ads_A10_v5` | `g2-standard-4`| | ||
# MAGIC | `llama_2_13b_chat_hf` | `g5.12xlarge` | `Standard_NC24ads_A100_v4` | `g2-standard-48` or `a2-ultragpu-1g`| | ||
# MAGIC | `llama_2_70b_chat_hf` | `p4d.24xlarge` or `g5.48xlarge` | `Standard_NC48ads_A100_v4` | `a2-highgpu-8g` or `g2-standard-96`| | ||
# MAGIC | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC # Install Dependencies | ||
# MAGIC To create and query the model serving endpoint, Databricks recommends to install the newest Databricks SDK for Python. | ||
|
||
# COMMAND ---------- | ||
|
||
# Upgrade to use the newest Databricks SDK | ||
%pip install --upgrade databricks-sdk | ||
dbutils.library.restartPython() | ||
|
||
# COMMAND ---------- | ||
|
||
import mlflow | ||
mlflow.__version__ | ||
|
||
# COMMAND ---------- | ||
|
||
dbutils.widgets.removeAll() | ||
|
||
# COMMAND ---------- | ||
|
||
# Default catalog name when installing the model from Databricks Marketplace. | ||
# Replace with the name of the catalog containing this model | ||
catalog_name = "databricks_llama_2_models" | ||
dbutils.widgets.text("1. Catalog", catalog_name) | ||
catalog = dbutils.widgets.get("1. Catalog") | ||
|
||
dbutils.widgets.text("2. Schema", "models") | ||
schema = dbutils.widgets.get("2. Schema") | ||
|
||
model_names = ['llama_2_7b_chat_hf', 'llama_2_13b_chat_hf', 'llama_2_70b_chat_hf'] | ||
dbutils.widgets.dropdown("3. Model", model_names[0], model_names) | ||
model_name = dbutils.widgets.get("3. Model") | ||
|
||
dbutils.widgets.text("4. Version", "1") | ||
version = dbutils.widgets.get("4. Version") | ||
|
||
dbutils.widgets.text("5. Workload", "GPU_MEDIUM") | ||
workload_type = dbutils.widgets.get("5. Workload") | ||
|
||
print("catalog: ", catalog) | ||
print("schema: ", schema) | ||
print("model : ", model_name) | ||
print("version: ", version) | ||
print("workload_type: ", workload_type) | ||
|
||
# COMMAND ---------- | ||
|
||
model_uc_path = f"{catalog}.{schema}.{model_name}" | ||
endpoint_name = f'andre_{model_name}_marketplace' | ||
|
||
print("model_uc_path: ", model_uc_path) | ||
print("endpoint_name: ", endpoint_name) | ||
|
||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC # Usage | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC Databricks recommends that you primarily work with this model via Model Serving | ||
# MAGIC ([AWS](https://docs.databricks.com/machine-learning/model-serving/create-manage-serving-endpoints.html)|[Azure](https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/create-manage-serving-endpoints)). | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC ## Deploying the model to Model Serving | ||
# MAGIC | ||
# MAGIC You can deploy this model directly to a Databricks Model Serving Endpoint | ||
# MAGIC ([AWS](https://docs.databricks.com/machine-learning/model-serving/create-manage-serving-endpoints.html)|[Azure](https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/create-manage-serving-endpoints)). | ||
# MAGIC | ||
# MAGIC Note: Model serving is not supported on GCP. On GCP, Databricks recommends running `Batch inference using Spark`, | ||
# MAGIC as shown below. | ||
# MAGIC | ||
# MAGIC We recommend the below workload types for each model size: | ||
# MAGIC | Model Name | Suggested workload type (AWS) | Suggested workload type (AZURE) | | ||
# MAGIC | --------------- | ----------------------------- | ------------------------------- | | ||
# MAGIC | `llama_2_7b_chat_hf` | GPU_MEDIUM | GPU_LARGE | | ||
# MAGIC | `llama_2_13b_chat_hf` | MULTIGPU_MEDIUM | GPU_LARGE | | ||
# MAGIC | `llama_2_70b_chat_hf` | Please reach out to your Databricks representative or [submit the preview enrollment form](https://docs.google.com/forms/d/1-GWIlfjlIaclqDz6BPODI2j1Xg4f4WbFvBXyebBpN-Y/edit?ts=65124ee0) to access the necessary GPUs and get information on the required `workload_type` to pass when creating a serving endpoint. || | ||
# MAGIC | ||
# MAGIC You can create the endpoint by clicking the “Serve this model” button above in the model UI. And you can also | ||
# MAGIC create the endpoint with Databricks SDK as following: | ||
# MAGIC | ||
|
||
# COMMAND ---------- | ||
|
||
import datetime | ||
|
||
from databricks.sdk import WorkspaceClient | ||
from databricks.sdk.service.serving import EndpointCoreConfigInput | ||
w = WorkspaceClient() | ||
|
||
config = EndpointCoreConfigInput.from_dict({ | ||
"served_models": [ | ||
{ | ||
"name": endpoint_name, | ||
"model_name": model_uc_path, | ||
"model_version": version, | ||
"workload_type": workload_type, | ||
"workload_size": "Small", | ||
"scale_to_zero_enabled": "False", | ||
} | ||
] | ||
}) | ||
model_details = w.serving_endpoints.create(name=endpoint_name, config=config) | ||
model_details.result(timeout=datetime.timedelta(minutes=60)) | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC ## SQL transcription using ai_query | ||
# MAGIC | ||
# MAGIC To generate the text using the endpoint, use `ai_query` | ||
# MAGIC to query the Model Serving endpoint. The first parameter should be the | ||
# MAGIC name of the endpoint you previously created for Model Serving. The second | ||
# MAGIC parameter should be a `named_struct` with name `prompt` and value is the | ||
# MAGIC column name that containing the instruction text. Extra parameters can be added | ||
# MAGIC to the named_struct too. For supported parameters, please refer to [MLFlow AI gateway completion routes](https://mlflow.org/docs/latest/gateway/index.html#completions) | ||
# MAGIC The third and fourth parameters set the return type, so that | ||
# MAGIC `ai_query` can properly parse and structure the output text. | ||
# MAGIC | ||
# MAGIC NOTE: `ai_query` is currently in Public Preview. Please sign up at [AI Functions Public Preview enrollment form](https://docs.google.com/forms/d/e/1FAIpQLScVyh5eRioqGwuUVxj9JOiKBAo0-FWi7L3f4QWsKeyldqEw8w/viewform) to try out the new feature. | ||
# MAGIC | ||
# MAGIC ```sql | ||
# MAGIC SELECT | ||
# MAGIC ai_query( | ||
# MAGIC <endpoint name>, | ||
# MAGIC named_struct("prompt", "What is ML?", "max_tokens", 256), | ||
# MAGIC 'returnType', | ||
# MAGIC 'STRUCT<candidates:ARRAY<STRUCT<text:STRING, metadata:STRUCT<finish_reason:STRING>>>, metadata:STRUCT<input_tokens:float, output_tokens:float, total_tokens:float> >' | ||
# MAGIC ) | ||
# MAGIC ``` | ||
# MAGIC | ||
# MAGIC You can use `ai_query` in this manner to generate text in | ||
# MAGIC SQL queries or notebooks connected to Databricks SQL Pro or Serverless | ||
# MAGIC SQL Endpoints. | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC ## Generate the text by querying the serving endpoint | ||
# MAGIC With the Databricks SDK, you can query the serving endpoint as follows: | ||
|
||
# COMMAND ---------- | ||
|
||
from databricks.sdk import WorkspaceClient | ||
|
||
# Change it to your own input | ||
dataframe_records = [ | ||
{"prompt": "What is ML?", "max_tokens": 512} | ||
] | ||
|
||
w = WorkspaceClient() | ||
w.serving_endpoints.query( | ||
name=endpoint_name, | ||
dataframe_records=dataframe_records, | ||
) | ||
|
||
# COMMAND ---------- | ||
|
||
# MAGIC %md | ||
# MAGIC ## Batch inference using Spark | ||
# MAGIC | ||
# MAGIC You can also directly load the model as a Spark UDF and run batch | ||
# MAGIC inference on Databricks compute using Spark. We recommend using a | ||
# MAGIC GPU cluster with Databricks Runtime for Machine Learning version 14.1 | ||
# MAGIC or greater. | ||
|
||
# COMMAND ---------- | ||
|
||
import mlflow | ||
mlflow.set_registry_uri("databricks-uc") | ||
|
||
logged_model = f"models:/{catalog}.models.{model_name}/{version}" | ||
logged_model | ||
|
||
# COMMAND ---------- | ||
|
||
generate = mlflow.pyfunc.spark_udf(spark, logged_model, "string") | ||
|
||
# COMMAND ---------- | ||
|
||
import pandas as pd | ||
|
||
df = spark.createDataFrame(pd.DataFrame({"text": pd.Series("What is ML?")})) | ||
|
||
# You can use the UDF directly on a text column | ||
generated_df = df.select(generate(df.text).alias('generated_text')) | ||
|
||
# COMMAND ---------- | ||
|
||
display(generated_df) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,27 @@ | ||
# Databricks notebook source | ||
# MAGIC %md ### Take Llama 2 model for a spin | ||
# MAGIC | ||
# MAGIC * Simple example for Llama 2 models available at [Databricks Marketplace Llama2](https://marketplace.databricks.com/details/46527194-66f5-4ea1-9619-d7ec6be6a2fa/Databricks_Llama-2-Models). | ||
# MAGIC * See [llama_2_marketplace_listing_example](https://marketplace.databricks.com/details/46527194-66f5-4ea1-9619-d7ec6be6a2fa/Databricks_Llama-2-Models) Marketplace example notebook. | ||
# MAGIC * Example for Llama 2 models available at [Databricks Marketplace Llama2](https://marketplace.databricks.com/details/46527194-66f5-4ea1-9619-d7ec6be6a2fa/Databricks_Llama-2-Models). | ||
# MAGIC * Demonstrates how to do both real-time and batch model inference. | ||
# MAGIC * Cluster instance type: | ||
# MAGIC * For batch, use `g4dn.xlarge`. | ||
# MAGIC * For model serving, use GPU_MEDIUM for `Workload type` and Small for `Workload size`. | ||
# MAGIC * GPU cluster configuration: | ||
# MAGIC * For batch, use `g4dn.xlarge` instance type (AWS). | ||
# MAGIC * For model serving use: | ||
# MAGIC * `Workload type` - GPU_MEDIUM | ||
# MAGIC * `Workload size` - Small | ||
# MAGIC | ||
# MAGIC ##### Notebooks | ||
# MAGIC * [Batch_Score_Llama_2]($Batch_Score_Llama_2) - Batch scoring with Spark UDF. | ||
# MAGIC * Input questions can be read from a table or file and output can be stored in a table. | ||
# MAGIC * [Model_Serve_Llama_2]($Model_Serve_Llama_2) - Real-time scoring with model serving endpoint. | ||
# MAGIC * [Common]($Common) | ||
# MAGIC | ||
# MAGIC ##### Enhanced Databricks Marketplace Sample | ||
# MAGIC * Simpler example demonstrating both batch a real-time inference. | ||
# MAGIC * Enhanced with parameterized widgets to replaced hard-coded values. | ||
# MAGIC * [Llama 2 Marketplace Listing Sample]($Llama_2_Marketplace_Listing_Sample) - enhanced notebook of below. | ||
# MAGIC * [llama_2_marketplace_listing_example](https://marketplace.databricks.com/details/46527194-66f5-4ea1-9619-d7ec6be6a2fa/Databricks_Llama-2-Models) - Original Marketplace example notebook. | ||
# MAGIC | ||
# MAGIC ##### Github | ||
# MAGIC * https://github.com/amesar/mlflow-examples/tree/master/databricks/notebooks/llama2 | ||
# MAGIC | ||
# MAGIC | ||
# MAGIC ##### Last updated: _2023-12-10_ |