- Example of Llama 2 models available at Databricks Marketplace Llama2.
- Demonstrates how to do both real-time and batch model inference.
- GPU cluster configuration:
- For batch, use
g4dn.xlarge
instance type (AWS). - For model serving use:
Workload type
- GPU_MEDIUMWorkload size
- Small
- For batch, use
- Batch_Score_Llama_2 - Batch scoring with Spark UDF.
- Input questions can be read from a table or file and output can be stored in a table.
- Model_Serve_Llama_2 - Real-time scoring with model serving endpoint.
- Common
- Simpler example demonstrating both batch a real-time inference.
- Enhanced with parameterized widgets to replaced hard-coded values.
- Llama 2 Marketplace Listing Sample - enhanced notebook of below.
- llama_2_marketplace_listing_example - Original Marketplace example notebook.