diff --git a/antora-playbook.yml b/antora-playbook.yml index a4edd64..9c95af5 100644 --- a/antora-playbook.yml +++ b/antora-playbook.yml @@ -1,5 +1,5 @@ site: - title: Serving LLM Models on OpenShift AI + title: Serving an LLM using OpenShift AI start_page: model-serving::index.adoc content: diff --git a/antora.yml b/antora.yml index 083fff5..b455aa8 100644 --- a/antora.yml +++ b/antora.yml @@ -1,9 +1,10 @@ name: model-serving -title: Serving LLM Models on OpenShift AI -version: 1.01 +title: Serving an LLM using OpenShift AI +version: 1.10 nav: - modules/ROOT/nav.adoc - modules/chapter1/nav.adoc - modules/chapter2/nav.adoc - modules/chapter3/nav.adoc +- modules/chapter4/nav.adoc - modules/appendix/nav.adoc \ No newline at end of file diff --git a/modules/ROOT/images/intro_v5.mp4 b/modules/ROOT/images/intro_v5.mp4 new file mode 100644 index 0000000..8ebff2a Binary files /dev/null and b/modules/ROOT/images/intro_v5.mp4 differ diff --git a/modules/ROOT/pages/index.adoc b/modules/ROOT/pages/index.adoc index ed734c0..f2bc366 100644 --- a/modules/ROOT/pages/index.adoc +++ b/modules/ROOT/pages/index.adoc @@ -1,8 +1,8 @@ -= Serving LLM Models on OpenShift AI += Serving an LLM using OpenShift AI :navtitle: Home -video::intro_v4.mp4[width=640] +video::intro_v5.mp4[width=640] Welcome to this quick course on _Serving an LLM using OpenShift AI_. @@ -47,7 +47,7 @@ When ordering this catalog item in RHDP: . Select Learning about the Product for the Purpose field - . Enter Learning RHOAI in the Salesforce ID field + . Leave the Salesforce ID field blank . Scroll to the bottom, and check the box to confirm acceptance of terms and conditions diff --git a/modules/appendix/pages/section1.adoc b/modules/appendix/pages/section1.adoc index ff862c7..e22e911 100644 --- a/modules/appendix/pages/section1.adoc +++ b/modules/appendix/pages/section1.adoc @@ -10,7 +10,7 @@ WARNING: Pending review. // This quiz uses things a learner might know from his previous experience with RHEL or OpenStack as *distractors*, but does NOT rely on any previous knowledge. Learners new to OpenStack and OpenShift should be able to answer all questions from only the contents on the previouis lecture. -1. Which of the following *Operator components* are either are required to enable Red Hat OpenShift AI on OpenShift with Single Model Serving Platform capabilities ? +1. Which of the following *Operator components* are required to enable Red Hat OpenShift AI on OpenShift with Single Model Serving Platform capabilities ? * [ ] Red Hat OpenShift serverless operator * [ ] Red Hat OpenShift service mesh operator diff --git a/modules/chapter1/nav.adoc b/modules/chapter1/nav.adoc index 62b8058..d6da93d 100644 --- a/modules/chapter1/nav.adoc +++ b/modules/chapter1/nav.adoc @@ -1,2 +1,3 @@ * xref:index.adoc[] -** xref:section1.adoc[] \ No newline at end of file +** xref:section1.adoc[] +** xref:section2.adoc[] \ No newline at end of file diff --git a/modules/chapter1/pages/index.adoc b/modules/chapter1/pages/index.adoc index a2d7cc0..84b251c 100644 --- a/modules/chapter1/pages/index.adoc +++ b/modules/chapter1/pages/index.adoc @@ -1,4 +1,4 @@ -= Technical side of LLMs += Technical Component Intoduction [NOTE] diff --git a/modules/chapter1/pages/section1.adoc b/modules/chapter1/pages/section1.adoc index 1e9ce77..a4d1653 100644 --- a/modules/chapter1/pages/section1.adoc +++ b/modules/chapter1/pages/section1.adoc @@ -1,4 +1,4 @@ -= Technology Components += Red Hat OpenShift AI == Kubernetes & OpenShift @@ -6,52 +6,10 @@ OpenShift builds upon Kubernetes by providing an enhanced platform with addition In addition, Openshift provides a Graphic User Interface for Kubernetes. Openshift AI runs on Openshift; therefore, the engine under the hood of both products is Kubernetes. -Most workloads are deployed in kubernetes via YAML files. A Kubernetes Deployment YAML file is a configuration file written in YAML (YAML Ain't Markup Language) that defines the desired state of a Kubernetes Deployment. These YAML file are used to create, update, or delete Deployments in Kubernetes / OpenShift clusters. +Most workloads are deployed in kubernetes via YAML files. A Kubernetes YAML manifest file is a configuration file written in YAML (YAML Ain't Markup Language) that defines the desired state of a Kubernetes deployment. These YAML files are used to create, update, or delete deployments in Kubernetes / OpenShift clusters. Don’t worry about needing to know how to write these files. That's what OpenShift & OpenShift AI will take care of for us. In this course, we will just need to select the options we want in the UI. OpenShift and OpenShift AI will take care of creating the YAML deployment files. We will have to perform a few YAML file copy-and-paste operations; instructions are provided in the course. -Just know, YAML files create resources directly in the Kubernetes platform. We primarily use the OpenShift AI UI to perform these tasks to deliver our LLM. - -== Large Language Models - -Large Language Models (LLMs) can generate new stories, summarize texts, and even perform advanced tasks like reasoning and problem solving, which is not only impressive but also remarkable due to their accessibility and easy integration into applications. - -As you probably already know, training large language models is expensive, time consuming, and most importantly requires a vast amount of data fed into the model. - -The common outcome from this training is a Foundation model: an LLM designed to generate and understand human-like text across a wide range of use cases. - -The key to this powerful language processing architecture, *is the Transformer!* A helpful definition of a *Transformer* is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The Transformer was created by Google and started as a language translation algorithm. It analyzes relationships between words in text, which crucial for LLMs to understand and generate language. - -This is how LLMs are able to predict the next words, by using the transformer neural network & attention mechanism to focus in on keywords to determine context. Then use that context and _knowledge_ from all the data ingested to predict the next word after a sequence of words. - -=== Modifications to LLMs - -As mentioned above, LLMs are normally large, require Graphics Cards, and costly compute resources to load the model into memory. - -However, there are techniques for compressing large LLM models, making them smaller and faster to run on devices with limited resources. - - * Quantization reduces the precision of numerical representations in large language models to make them more memory-efficient during deployment. - - * Reducing the precision of LLM parameters to save computational resources without sacrificing performance. Trimming surplus connections or parameters to make LLMs smaller and faster yet performant. - -In this course, we will be using a quantized version of the Mistral Large Language Model. Instead of requiring 24Gb of memory and Graphics processing unit to simulate the neural network, we are going to run our model with 4 CPUs and 8GB of ram, burstable to 8 CPU with 10Gb ram max. - -[NOTE] -https://www.redhat.com/en/topics/ai/what-is-instructlab[*InstructLabs*]- runs locally on laptops uses this same type of quantized LLMs, Both the Granite & Mixtral Large Language Models are reduced in precision to operate on a laptop. - -== The Ollama Model Framework - -There are hundreds of popular LLMs, nonetheless, their operation remains the same: users provide instructions or tasks in natural language, and the LLM generates a response based on what the model "thinks" could be the continuation of the prompt. - -Ollama is not an LLM Model - Ollama is a relatively new but powerful open-source framework designed for serving machine learning models. It's designed to be efficient, scalable, and easy to use; making it an attractive option for developers and organizations looking to deploy their AI models into production. - -=== How does Ollama work? - - -At its core, Ollama simplifies the process of downloading, installing, and interacting with a wide range of LLMs, empowering users to explore their capabilities without the need for extensive technical expertise or reliance on cloud-based platforms. - -In this course, we will focus on single LLM, Mistral, run on the Ollama Framework. However, with the understanding of the Ollama Framework, we will be able to work with a variety of large language models utilizing the exact same configuration. - -You will be able to switch models in minutes, all running on the same platform. This will enable you test, compare, and evalute multiple models with the skills gained in the course. \ No newline at end of file +Just know, YAML files create resources directly in the Kubernetes platform. We primarily use the OpenShift AI UI to perform these tasks to deliver our LLM. \ No newline at end of file diff --git a/modules/chapter1/pages/section2.adoc b/modules/chapter1/pages/section2.adoc index 2795151..51cfb3d 100644 --- a/modules/chapter1/pages/section2.adoc +++ b/modules/chapter1/pages/section2.adoc @@ -1,2 +1,41 @@ -= Follow up Story += Large Language Models +== Large Language Models + +Large Language Models (LLMs) can generate new stories, summarize texts, and even perform advanced tasks like reasoning and problem solving, which is not only impressive but also remarkable due to their accessibility and easy integration into applications. + +As you probably already know, training large language models is expensive, time consuming, and most importantly requires a vast amount of data fed into the model. + +The common outcome from this training is a Foundation model: an LLM designed to generate and understand natural language text across a wide range of use cases. + +The key to this powerful language processing architecture, *is the Transformer!* A helpful definition of a *Transformer* is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The Transformer was created by Google and started as a language translation algorithm. It analyzes relationships between words in text, which crucial for LLMs to understand and generate language. + +This is how LLMs are able to predict the next words, by using the transformer neural network & attention mechanism to focus in on keywords to determine context. Then use that context and _knowledge_ from all the data ingested to predict the next word after a sequence of words. + +=== Modifications to LLMs + +As mentioned above, LLMs are normally large, require GPU or accelerator chips, and costly compute resources to load the model into memory. + +However, there are techniques for compressing large LLM models, making them smaller and faster to run on devices with limited resources. + + * Quantization reduces the precision of numerical representations in large language models to make them more memory-efficient during deployment. + + * Reducing the precision of LLM parameters to save computational resources without sacrificing performance. Trimming surplus connections or parameters to make LLMs smaller and faster yet performant. + +In this course, we will be using a quantized version of the Mistral Large Language Model. Instead of requiring 24Gb of memory and graphics processing unit to simulate the neural network, we are going to run our model with 4 CPUs and 8GB of ram, burstable to 8 CPU with 10Gb ram max. + + +== The Ollama Model Framework + +There are hundreds of popular LLMs, nonetheless, their operation remains the same: users provide instructions or tasks in natural language, and the LLM generates a response based on what the model "thinks" could be the continuation of the prompt. + +Ollama is not an LLM Model - Ollama is a relatively new but powerful open-source framework designed for serving machine learning models. It's designed to be efficient, scalable, and easy to use; making it an attractive option for developers and organizations looking to deploy their AI models into production. + +=== How does Ollama work? + + +At its core, Ollama simplifies the process of downloading, installing, and interacting with a wide range of LLMs, empowering users to explore their capabilities without the need for extensive technical expertise or reliance on cloud-based platforms. + +In this course, we will focus on single LLM, Mistral, run on the Ollama Framework. However, with the understanding of the Ollama Framework, we will be able to work with a variety of large language models utilizing the exact same configuration. + +You will be able to switch models in minutes, all running on the same platform. This will enable you test, compare, and evalute multiple models with the skills gained in the course. \ No newline at end of file diff --git a/modules/chapter2/images/llm_dsc_v3.mp4 b/modules/chapter2/images/llm_dsc_v3.mp4 new file mode 100644 index 0000000..7f8baaa Binary files /dev/null and b/modules/chapter2/images/llm_dsc_v3.mp4 differ diff --git a/modules/chapter2/images/llm_tls_v3.mp4 b/modules/chapter2/images/llm_tls_v3.mp4 new file mode 100644 index 0000000..f60f351 Binary files /dev/null and b/modules/chapter2/images/llm_tls_v3.mp4 differ diff --git a/modules/chapter2/nav.adoc b/modules/chapter2/nav.adoc index d6da93d..8dcc0c4 100644 --- a/modules/chapter2/nav.adoc +++ b/modules/chapter2/nav.adoc @@ -1,3 +1,4 @@ * xref:index.adoc[] ** xref:section1.adoc[] -** xref:section2.adoc[] \ No newline at end of file +** xref:section2.adoc[] +** xref:section3.adoc[] \ No newline at end of file diff --git a/modules/chapter2/pages/index.adoc b/modules/chapter2/pages/index.adoc index 9ca4245..f357d2d 100644 --- a/modules/chapter2/pages/index.adoc +++ b/modules/chapter2/pages/index.adoc @@ -3,7 +3,7 @@ == Supported configurations OpenShift AI is supported in two configurations: - * A managed cloud service add-on for *Red Hat OpenShift Dedicated* (with a Customer Cloud Subscription for AWS or GCP) or for Red Hat OpenShift Service on Amazon Web Services (ROSA). + * A managed cloud service add-on for *Red Hat OpenShift Service on Amazon Web Services* (ROSA, with a Customer Cloud Subscription for AWS) or *Red Hat OpenShift Dedicated* (GCP). For information about OpenShift AI on a Red Hat managed environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_cloud_service/1[Product Documentation for Red Hat OpenShift AI Cloud Service 1]. * Self-managed software that you can install on-premise or on the public cloud in a self-managed environment, such as *OpenShift Container Platform*. diff --git a/modules/chapter2/pages/section1.adoc b/modules/chapter2/pages/section1.adoc index 96cf8de..157481e 100644 --- a/modules/chapter2/pages/section1.adoc +++ b/modules/chapter2/pages/section1.adoc @@ -12,6 +12,8 @@ This exercise uses the Red Hat Demo Platform; specifically the OpenShift Contain . Login to the Red Hat OpenShift using a user which has the _cluster-admin_ role assigned. +. It’s sufficient to install all prerequisite operators with default settings, no additional configuration is necessary. + . Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. Click on the button or tile for each. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. For this lab you can skip the installation of the optional operators. [*] You do not have to wait for the previous Operator to complete before installing the next. For this lab you can skip the installation of the optional operators as there is no GPU. @@ -39,7 +41,7 @@ This exercise uses the Red Hat Demo Platform; specifically the OpenShift Contain image::openshiftai_operator.png[width=640] -. Click on the `Red{nbsp}Hat OpenShift AI` operator. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. +. Click on the `Red{nbsp}Hat OpenShift AI` operator. In the pop up window that opens, ensure you select the latest version in the *fast* channel. Any version greater than 2.91 and click on **Install** to open the operator's installation view. + . In the `Install Operator` page, leave all of the options as default and click on the *Install* button to start the installation. diff --git a/modules/chapter2/pages/section2.adoc b/modules/chapter2/pages/section2.adoc index ca94712..aeb3054 100644 --- a/modules/chapter2/pages/section2.adoc +++ b/modules/chapter2/pages/section2.adoc @@ -1,6 +1,6 @@ = Modifying the OpenShift AI TLS Certificate -video::openshiftai_tls.mp4[width=640] +video::llm_tls_v3.mp4[width=640] [NOTE] @@ -59,53 +59,4 @@ image::createsecret.png[width=640] * Copy the name of the secret from line 4, just the name (optional, but helpful) * Click *create* to apply this YAML into the istio-system proejct (namespace). -*We have copied the Secret used by OCP & made it available be used by OAI.* - - - - -== Create OpenShift AI Data Science Cluster - -With our secrets in place, the next step is to create OpenShift AI *Data Science Cluster*. - -Return to the OpenShift Navigation Menu, Select Installed Operators, and Click on the OpenShift AI Operator name to open the operator. - - . *Select the Option to create a Data Science Cluster.* - - . *Select the radial button to switch to the YAML view.* - - . Find the section below in the YAML file, in the Kserve Section find the Serving/Certificate area; add the line: *secretName:* followed by the name of the secret name that we deployed in the istio-system project. In addition, change the type from SelfSigned to *Provided*. See below for the example. - -```yaml -kserve: -devFlags: {} -managementState: Managed -serving: - ingressGateway: - certificate: - secretName: ingress-certs-XX-XX-2024 - type: Provided - managementState: Managed - name: knative-serving -``` -image::dcsyamlfile.png[width=640] - -Once you have made those changes to the YAML file, *Click Create* to Deploy the Data Science Cluster. - -image::createDSC.png[width=640] - -Single Model Serve Platform will now be deployed to expose ingress connections with the same certificate as OpenShift Routes. Endpoints will be accessible using TLS without having to ignore error messages or create special configurations. - -== OpenShift AI install summary - -Congratulations, you have successfully completed the installation of OpenShift AI on an OpenShift Container Cluster. OpenShift AI is now running on a new Dashboard! - - - * We installed the required OpenShift AI Operators - ** Serverless, ServiceMesh, & Pipelines Operators - ** OpenShift AI Operator - ** Web Terminal Operator - -Additionally, we took this installation a step further by sharing TLS certificates from the OpenShift Cluster with OpenShift AI. - -We will pick up working with the OpenShift AI UI in the next Chapter. \ No newline at end of file +*We have copied the Secret used by OCP & made it available be used by OAI.* \ No newline at end of file diff --git a/modules/chapter2/pages/section3.adoc b/modules/chapter2/pages/section3.adoc new file mode 100644 index 0000000..caf6559 --- /dev/null +++ b/modules/chapter2/pages/section3.adoc @@ -0,0 +1,51 @@ += Configure the OpenShift AI Data Science Cluster + +video::llm_dsc_v3.mp4[width=640] + +== Create OpenShift AI Data Science Cluster + +With our secrets in place, the next step is to create OpenShift AI *Data Science Cluster*. + +_A DataScienceCluster is the plan in the form of an YAML outline for Data Science Cluster API deployment._ + +Return to the OpenShift Navigation Menu, Select Installed Operators, and Click on the OpenShift AI Operator name to open the operator. + + . *Select the Option to create a Data Science Cluster.* + + . *Select the radial button to switch to the YAML view.* + + . Find the section below in the YAML file, in the Kserve Section find the Serving/Certificate area; add the line: *secretName:* followed by the name of the secret name that we deployed in the istio-system project. In addition, change the type from SelfSigned to *Provided*. See below for the example. + +```yaml +kserve: +devFlags: {} +managementState: Managed +serving: + ingressGateway: + certificate: + secretName: ingress-certs-XX-XX-2024 + type: Provided + managementState: Managed + name: knative-serving +``` +image::dcsyamlfile.png[width=640] + +Once you have made those changes to the YAML file, *Click Create* to Deploy the Data Science Cluster. + +image::createDSC.png[width=640] + +Single Model Serve Platform will now be deployed to expose ingress connections with the same certificate as OpenShift Routes. Endpoints will be accessible using TLS without having to ignore error messages or create special configurations. + +== OpenShift AI install summary + +Congratulations, you have successfully completed the installation of OpenShift AI on an OpenShift Container Cluster. OpenShift AI is now running on a new Dashboard! + + + * We installed the required OpenShift AI Operators + ** Serverless, ServiceMesh, & Pipelines Operators + ** OpenShift AI Operator + ** Web Terminal Operator + +Additionally, we took this installation a step further by sharing TLS certificates from the OpenShift Cluster with OpenShift AI. + +We will pick up working with the OpenShift AI UI in the next Chapter. \ No newline at end of file diff --git a/modules/chapter3/images/llm_dataconn_v3.mp4 b/modules/chapter3/images/llm_dataconn_v3.mp4 new file mode 100644 index 0000000..bb34cc9 Binary files /dev/null and b/modules/chapter3/images/llm_dataconn_v3.mp4 differ diff --git a/modules/ROOT/images/intro_v4.mp4 b/modules/chapter3/images/llm_dsp_v3.mp4 similarity index 74% rename from modules/ROOT/images/intro_v4.mp4 rename to modules/chapter3/images/llm_dsp_v3.mp4 index d8ff088..fc6971f 100644 Binary files a/modules/ROOT/images/intro_v4.mp4 and b/modules/chapter3/images/llm_dsp_v3.mp4 differ diff --git a/modules/chapter3/images/openshiftai_setup_part1.mp4 b/modules/chapter3/images/llm_minio_v3.mp4 similarity index 82% rename from modules/chapter3/images/openshiftai_setup_part1.mp4 rename to modules/chapter3/images/llm_minio_v3.mp4 index f014ad0..e7bbcb4 100644 Binary files a/modules/chapter3/images/openshiftai_setup_part1.mp4 and b/modules/chapter3/images/llm_minio_v3.mp4 differ diff --git a/modules/chapter3/nav.adoc b/modules/chapter3/nav.adoc index d6da93d..8dcc0c4 100644 --- a/modules/chapter3/nav.adoc +++ b/modules/chapter3/nav.adoc @@ -1,3 +1,4 @@ * xref:index.adoc[] ** xref:section1.adoc[] -** xref:section2.adoc[] \ No newline at end of file +** xref:section2.adoc[] +** xref:section3.adoc[] \ No newline at end of file diff --git a/modules/chapter3/pages/index.adoc b/modules/chapter3/pages/index.adoc index b242e60..17312f2 100644 --- a/modules/chapter3/pages/index.adoc +++ b/modules/chapter3/pages/index.adoc @@ -2,9 +2,6 @@ This chapter begins with running and configured OpenShift AI environment. If you don't already have your environment running, head over to Chapter 2. -There's a lot to cover in section 1, we add the Ollama custom Runtime, create a data science project, setup storage, create a workbench, and finally serve the Ollama Framework, utilizing the Single Model Serving Platform to deliver our model to our Notebook Application. - - -In section 2, we will explore using the Jupyter Notebook from our workbench to infer data from the Mistral 7B LLM. While less technical than previous section of this hands-on course, there are some steps to download the Mistral Model, update our notebook with inference endpoint, and evaluate our Models performance. +There's a lot to cover in this section, we add the Ollama custom Runtime, create a data science project, setup MinIO storage, create a workbench, and finally serve the Ollama Framework, utilizing the Single Model Serving Platform to deliver our model to our Notebook Application. Let's get started! \ No newline at end of file diff --git a/modules/chapter3/pages/section1.adoc b/modules/chapter3/pages/section1.adoc index a7107d4..d8ef162 100644 --- a/modules/chapter3/pages/section1.adoc +++ b/modules/chapter3/pages/section1.adoc @@ -1,6 +1,6 @@ -= OpenShift AI Customization += Creating OpenShift AI Resources - 1 -video::openshiftai_setup_part1.mp4[width=640] +video::llm_dsp_v3.mp4[width=640] == Model Serving Runtimes @@ -88,296 +88,6 @@ Navigate to & select the Data Science Projects section. image::dsp_create.png[width=640] -== Deploy MinIO as S3 Compatible Storage -=== MinIO overview - -*MinIO* is a high-performance, S3-compatible object store. It can be deployed on a wide variety of platforms, and it comes in multiple flavors. - -This segment describes a very quick way of deploying the community version of MinIO in order to quickly setup a fully standalone Object Store, in an OpenShift Cluster. This can then be used for various prototyping tasks that require Object Storage. - -[WARNING] -This version of MinIO should not be used in production-grade environments. Aditionally, MinIO is not included in RHOAI, and Red Hat does not provide support for MinIO. - -=== MinIO Deployment -To Deploy MinIO, we will utilize the OpenShift Dashboard. - - . Click on the Project Selection list dropdown and select the Ollama-Model project or the data science project you created in the previous step. - - . Then Select the + (plus) icon from the top right of the dashboard. - -image::minio2.png[width=640] - - . In the new window, we will paste the following YAML file. In the YAML below its recommended to change the default user name & password. - - -```yaml ---- -kind: PersistentVolumeClaim -apiVersion: v1 -metadata: - name: minio-pvc -spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 40Gi - volumeMode: Filesystem ---- -kind: Secret -apiVersion: v1 -metadata: - name: minio-secret -stringData: - # change the username and password to your own values. - # ensure that the user is at least 3 characters long and the password at least 8 - minio_root_user: minio - minio_root_password: minio123 ---- -kind: Deployment -apiVersion: apps/v1 -metadata: - name: minio -spec: - replicas: 1 - selector: - matchLabels: - app: minio - template: - metadata: - creationTimestamp: null - labels: - app: minio - spec: - volumes: - - name: data - persistentVolumeClaim: - claimName: minio-pvc - containers: - - resources: - limits: - cpu: 250m - memory: 1Gi - requests: - cpu: 20m - memory: 100Mi - readinessProbe: - tcpSocket: - port: 9000 - initialDelaySeconds: 5 - timeoutSeconds: 1 - periodSeconds: 5 - successThreshold: 1 - failureThreshold: 3 - terminationMessagePath: /dev/termination-log - name: minio - livenessProbe: - tcpSocket: - port: 9000 - initialDelaySeconds: 30 - timeoutSeconds: 1 - periodSeconds: 5 - successThreshold: 1 - failureThreshold: 3 - env: - - name: MINIO_ROOT_USER - valueFrom: - secretKeyRef: - name: minio-secret - key: minio_root_user - - name: MINIO_ROOT_PASSWORD - valueFrom: - secretKeyRef: - name: minio-secret - key: minio_root_password - ports: - - containerPort: 9000 - protocol: TCP - - containerPort: 9090 - protocol: TCP - imagePullPolicy: IfNotPresent - volumeMounts: - - name: data - mountPath: /data - subPath: minio - terminationMessagePolicy: File - image: >- - quay.io/minio/minio:RELEASE.2023-06-19T19-52-50Z - args: - - server - - /data - - --console-address - - :9090 - restartPolicy: Always - terminationGracePeriodSeconds: 30 - dnsPolicy: ClusterFirst - securityContext: {} - schedulerName: default-scheduler - strategy: - type: Recreate - revisionHistoryLimit: 10 - progressDeadlineSeconds: 600 ---- -kind: Service -apiVersion: v1 -metadata: - name: minio-service -spec: - ipFamilies: - - IPv4 - ports: - - name: api - protocol: TCP - port: 9000 - targetPort: 9000 - - name: ui - protocol: TCP - port: 9090 - targetPort: 9090 - internalTrafficPolicy: Cluster - type: ClusterIP - ipFamilyPolicy: SingleStack - sessionAffinity: None - selector: - app: minio ---- -kind: Route -apiVersion: route.openshift.io/v1 -metadata: - name: minio-api -spec: - to: - kind: Service - name: minio-service - weight: 100 - port: - targetPort: api - wildcardPolicy: None - tls: - termination: edge - insecureEdgeTerminationPolicy: Redirect ---- -kind: Route -apiVersion: route.openshift.io/v1 -metadata: - name: minio-ui -spec: - to: - kind: Service - name: minio-service - weight: 100 - port: - targetPort: ui - wildcardPolicy: None - tls: - termination: edge - insecureEdgeTerminationPolicy: Redirect -``` - -*This should finish in a few seconds. Now it's time to deploy our storage buckets.* - -video::openshiftai_setup_part2.mp4[width=640] - -=== MinIO Storage Bucket Creation - -From the OCP Dashboard: - - . Select Networking / Routes from the navigation menu. - - . This will display two routes, one for the UI & another for the API. - - . For the first step, select the UI route and paste it in a browser Window. - - . This window opens the MinIO Dashboard. Log in with username/password combination you set, or the default listed in yaml file above. - -Once logged into the MinIO Console: - - . Click Create Bucket to get started. - - . Create two Buckets: - - .. *models* - - .. *storage* - -[NOTE] - When serving an LLM or other model, Openshift AI looks within a Folder. Therefore, we need at least one subdirectory under the Models Folder. - - . Via the Navigation menu, *select object browser*, then click on the Model Bucket. - . From the models bucket page, click add path, and type *ollama* as the name of the sub-folder or path. - -[IMPORTANT] -In most cases, to serve a model, the trained model would be uploaded into this sub-directory. *However, Ollama is a special case, as it can download and manage Several LLM models as part of the runtime.* - - . We still need a file available in this folder for the model deployment workflow to succeed. - - . So we will copy an *emptyfile.txt* file to the ollama subdirectory. You can download the file from https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main/serving-runtimes/ollama_runtime[*this location*]. Alternatively, you can create your own file called emptyfile.txt and upload it. - - . Once you have this file ready, upload it into the Ollama path in the model bucket by clicking the upload button and selecting the file from your local desktop. - -=== Create Data Connection - -Navigate to the Data Science Project section of the OpenShift AI Console /Dashboard. Select the Ollama-model project. - -. Select the Data Connection menu, followed by create data connection -. Provide the following values: -.. Name: *models* -.. Access Key: use the minio_root-user from YAML file -.. Secret Key: use the minio_root_password from the YAML File -.. Endpoint: use the Minio API URL from the Routes page in Openshift Dashboard -.. Region: This is required for AWS storage & cannot be blank (no-region-minio) -.. Bucket: use the Minio Storage bucket name: *models* - -image::dataconnection_models.png[width=800] - -Repeat the same process for the Storage bucket, using *storage* for the name & bucket. - -== Creating a WorkBench - -video::openshiftai_setup_part3.mp4[width=640] - -Navigate to the Data Science Project section of the OpenShift AI Console /Dashboard. Select the Ollama-model project. - -image::create_workbench.png[width=800] - - . Select the WorkBench button, then click create workbench - - .. Name: `ollama-model` - - .. Notebook Image: `Minimal Python` - - .. Leave the remianing options default. - - .. Optionally, scroll to the bottom, check the `Use data connection box`. - - .. Select *storage* from the dropdown to attach the storage bucket to the workbench. - - . Select the Create Workbench option. - -[NOTE] -Depending on the notebook image selected, it can take between 2-20 minutes for the container image to be fully deployed. The Open Link will be available when our container is fully deployed. - - -== Creating The Model Server - -From the ollama-model WorkBench Dashboard in the ollama-model project, navigate to the **Models** section, and select Deploy Model from the **Single Model Serving Platform Button**. - -image::deploy_model_2.png[width=800] - -*Create the model server with the following values:* - - - .. Model name: `Ollama-Mistral` - .. Serving Runtime: `Ollama` - .. Model framework: `Any` - .. Model Server Size: `Medium` - .. Model location data connection: `models` - .. Model location path: `/ollama` - - -After clicking the **Deploy** button at the bottom of the form, the model is added to our **Models & Model Server list**. When the model is available, the inference endpoint will populate & the status will indicate a green checkmark. - -We are now ready to interact with our newly deployed LLM Model. Join me in Section 2 to explore Mistral running on OpenShift AI using Jupyter Notebooks. diff --git a/modules/chapter3/pages/section2.adoc b/modules/chapter3/pages/section2.adoc index 524c530..7507aab 100644 --- a/modules/chapter3/pages/section2.adoc +++ b/modules/chapter3/pages/section2.adoc @@ -1,141 +1,230 @@ -= Jupyter Notebooks & Mistral LLM Model Setup += MinIO S3 Compatible Storage Setup -video::openshiftai_notebook.mp4[width=640] +video::llm_minio_v3.mp4[width=640] -== Open the Jupyter Notebook +== Deploy MinIO as S3 compatible storage -From the OpenShift AI ollama-model workbench dashboard: +=== MinIO overview -* Select the Open link to the right of the status section. When the new window opens, use the OpenShift admin user & password to login to the Notebook. +*MinIO* is a high-performance, S3-compatible object store. It can be deployed on a wide variety of platforms, and it comes in multiple flavors. -* Click *Allow selected permissions* button to complete login to the notebook. - -[NOTE] -If the *OPEN* link for the notebook is grayed out, the notebook container is still starting. This process can take a few minutes & up to 20+ minutes depending on the notebook image we opted to choose. - - -== Inside the Jupyter Notebook - -Clone the notebook file to interact with the Ollama Framework from this location: https://github.com/rh-aiservices-bu/llm-on-openshift.git - -Navigate to the llm-on-openshift/examples/notebooks/langchain folder: - -Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_ - -Explore the notebook, and then continue. - -=== Update the Inference Endpoint - -Head back to the RHOAI workbench dashboard & copy the interence endpoint from our ollama-mistral model. -// Should it be inference instead of interence? - -Return the Jupyter Notebook Environment: - - . Paste the inference endpoint into the Cell labeled interfence_server_url = *"replace with your own inference address"* - -image::serverurl.png[width=800] - - . We can now start executing the code in the cells, starting with the set the inference server URL cell. - - . Next we run the second cell: !pip install -q langchain==0.1.14 ; there is a notice to update pip, just ignore and continue. - - . The third cell imports the langchain components that provide the libraries and programming files to interact with our LLM model. - - . In the fourth cell, place our first call to the Ollama-Mistral Framework Served by OpenShift AI. +This segment describes a very quick way of deploying the community version of MinIO in order to quickly setup a fully standalone Object Store, in an OpenShift Cluster. This can then be used for various prototyping tasks that require Object Storage. [WARNING] -Before we continue, we need to perform the following additional step. As mentioned, The Ollama Model Runtime we launched in OpenShift AI is a Framework that can host multiple LLM Models. It is currently running but is waiting for the command to instruct it to download Model to Serve. The following command needs to run from the OpenShift Dashboard. We are going to use the web_terminal operator to perform this next step. - -== Activating the Mistral Model in Ollama - -We will need to obtain the endpoint from the OpenShift AI model serving console. I usually just paste the text below into a cell in the Jupyter Notebook and paste the url in the code block from there. - -image::mistral_config.png[width=640] - -[source, yaml] ----- -curl https://your-endpoint/api/pull \ - -k \ - -H "Content-Type: application/json" \ - -d '{"name": "mistral"}' ----- - - . Next copy the entire code snippet, and open the OpenShift Dashboard. - . At the top right of the dashboard, locate the ">_" and select it. - . This will open the terminal window at the bottom of the dashboard. - . Click on the Start button in the terminal window, wait for the bash..$ prompt to appear - . Past the modified code block into the window and press enter. - -The message: *status: pulling manifest* should appear. This begins the model downloading process. - -image::curl_command.png[width=800] - -Once the download completes, the *status: success:* message appears. We can now return to the Jupyter Notebook Tab in the browser and proceed. - -=== Create the Prompt - -This cell sets the *system message* portion of the query to our model. Normally, we don't get the see this part of the query. This message details how the model should act, respond, and consider our questions. It adds checks to valdiate the information is best as possible, and to explain answers in detail. - -== Memory for the conversation +This version of MinIO should not be used in production-grade environments. Aditionally, MinIO is not included in RHOAI, and Red Hat does not provide support for MinIO. + +=== MinIO Deployment +To Deploy MinIO, we will utilize the OpenShift Dashboard. + + . Click on the Project Selection list dropdown and select the Ollama-Model project or the data science project you created in the previous step. + + . Then Select the + (plus) icon from the top right of the dashboard. + +image::minio2.png[width=640] + + . In the new window, we will paste the following YAML file. In the YAML below its recommended to change the default user name & password. + + +```yaml +--- +kind: PersistentVolumeClaim +apiVersion: v1 +metadata: + name: minio-pvc +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 40Gi + volumeMode: Filesystem +--- +kind: Secret +apiVersion: v1 +metadata: + name: minio-secret +stringData: + # change the username and password to your own values. + # ensure that the user is at least 3 characters long and the password at least 8 + minio_root_user: minio + minio_root_password: minio123 +--- +kind: Deployment +apiVersion: apps/v1 +metadata: + name: minio +spec: + replicas: 1 + selector: + matchLabels: + app: minio + template: + metadata: + creationTimestamp: null + labels: + app: minio + spec: + volumes: + - name: data + persistentVolumeClaim: + claimName: minio-pvc + containers: + - resources: + limits: + cpu: 250m + memory: 1Gi + requests: + cpu: 20m + memory: 100Mi + readinessProbe: + tcpSocket: + port: 9000 + initialDelaySeconds: 5 + timeoutSeconds: 1 + periodSeconds: 5 + successThreshold: 1 + failureThreshold: 3 + terminationMessagePath: /dev/termination-log + name: minio + livenessProbe: + tcpSocket: + port: 9000 + initialDelaySeconds: 30 + timeoutSeconds: 1 + periodSeconds: 5 + successThreshold: 1 + failureThreshold: 3 + env: + - name: MINIO_ROOT_USER + valueFrom: + secretKeyRef: + name: minio-secret + key: minio_root_user + - name: MINIO_ROOT_PASSWORD + valueFrom: + secretKeyRef: + name: minio-secret + key: minio_root_password + ports: + - containerPort: 9000 + protocol: TCP + - containerPort: 9090 + protocol: TCP + imagePullPolicy: IfNotPresent + volumeMounts: + - name: data + mountPath: /data + subPath: minio + terminationMessagePolicy: File + image: >- + quay.io/minio/minio:RELEASE.2023-06-19T19-52-50Z + args: + - server + - /data + - --console-address + - :9090 + restartPolicy: Always + terminationGracePeriodSeconds: 30 + dnsPolicy: ClusterFirst + securityContext: {} + schedulerName: default-scheduler + strategy: + type: Recreate + revisionHistoryLimit: 10 + progressDeadlineSeconds: 600 +--- +kind: Service +apiVersion: v1 +metadata: + name: minio-service +spec: + ipFamilies: + - IPv4 + ports: + - name: api + protocol: TCP + port: 9000 + targetPort: 9000 + - name: ui + protocol: TCP + port: 9090 + targetPort: 9090 + internalTrafficPolicy: Cluster + type: ClusterIP + ipFamilyPolicy: SingleStack + sessionAffinity: None + selector: + app: minio +--- +kind: Route +apiVersion: route.openshift.io/v1 +metadata: + name: minio-api +spec: + to: + kind: Service + name: minio-service + weight: 100 + port: + targetPort: api + wildcardPolicy: None + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect +--- +kind: Route +apiVersion: route.openshift.io/v1 +metadata: + name: minio-ui +spec: + to: + kind: Service + name: minio-service + weight: 100 + port: + targetPort: ui + wildcardPolicy: None + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect +``` + +*This should finish in a few seconds. Now it's time to deploy our storage buckets.* + +=== MinIO Storage Bucket Creation + +From the OCP Dashboard: + + . Select Networking / Routes from the navigation menu. + + . This will display two routes, one for the UI & another for the API. + + . For the first step, select the UI route and paste it in a browser Window. + + . This window opens the MinIO Dashboard. Log in with username/password combination you set, or the default listed in yaml file above. + +Once logged into the MinIO Console: + + . Click Create Bucket to get started. + + . Create two Buckets: + + .. *models* + + .. *storage* -This cell keeps track of the conversation, this way history of the chat are also sent along with new chat information, keeping the context for future questions. - -The next cell tracks the conversation and prints it to the Notebook output window so we can experience the full conversation list. - -=== First input to our LLM - -The Notebooks first input to our model askes it to describe Paris in 100 words or less. - -In green text is the window, there is the setup message that is sent along with the single sentence question to desctibe to the model how to consider and respond to the question. - -It takes approximately 12 seconds for the model to respond with the first word of the reply, and the final word is printed to the screen approximately 30 seconds after the request was started. - -image::paris.png[width=800] - -The responce answered the question in a well-considered and informated paragraph that is less than 100 words in length. - -=== Second Input - -Notice that the Second input - "Is there a River" - does not specify where the location is that might have a River. Because the conversation history is passed with the second input, there is not need to specify any additional informaiton. - -image::london.png[width=800] - -The total time to first word took approximately 14 seconds this time, just a bit longer due the orginal information being sent. The time for the entire reponse to be printed to the screen just took over 4 seoncds. - -Overall our Model is performing well without a GPU and in a container limited to 4 cpus & 10Gb of memory. - -== Second Example Prompt - -Similar to the previous example, except we use the City of London, and run a cell to remove the verbose text reguarding what is sent or recieved apart from the answer from the model. - -There is no change to memory setting, but go ahead and evalute where the second input; "Is there a river?" is answer correctly. - -== Experimentation with Model - -Add a few new cells to the Notebook. - -image::experiment.png[width=800] - -Experiment with clearing the memory statement, then asking the river question again. Or perhaps copy one of the input statements and add your own question for the model. - -Try not clearing the memory and asking a few questions. - -**You have successfully deployed a Large Language Model, now test the information that it has available and find out what is doesn't know.** - - -== Delete the Environment - -Once you have finished experimenting with questions, make sure you head back to the Red Hat Demo Platform and delete the Openshift Container Platform Cluster. - -You don't have to remove any of the resources; deleting the environment will remove any resources created during this lesson. - -=== Leave Feedback - -If you enjoyed this walkthrough, please send the team a note. -If you have suggestions to make it better or clarify a point, please send the team a note. +[NOTE] + When serving an LLM or other model, Openshift AI looks within a Folder. Therefore, we need at least one subdirectory under the Models Folder. -Until next time, Keep being Awesome! + . Via the Navigation menu, *select object browser*, then click on the Model Bucket. + . From the models bucket page, click add path, and type *ollama* as the name of the sub-folder or path. +[IMPORTANT] +In most cases, to serve a model, the trained model would be uploaded into this sub-directory. *However, Ollama is a special case, as it can download and manage Several LLM models as part of the runtime.* + . We still need a file available in this folder for the model deployment workflow to succeed. + . So we will copy an *emptyfile.txt* file to the ollama subdirectory. You can download the file from https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main/serving-runtimes/ollama_runtime[*this location*]. Alternatively, you can create your own file called emptyfile.txt and upload it. + . Once you have this file ready, upload it into the Ollama path in the model bucket by clicking the upload button and selecting the file from your local desktop. + \ No newline at end of file diff --git a/modules/chapter3/pages/section3.adoc b/modules/chapter3/pages/section3.adoc new file mode 100644 index 0000000..e98d2ce --- /dev/null +++ b/modules/chapter3/pages/section3.adoc @@ -0,0 +1,69 @@ += OpeonShift AI Resources - 2 + +video::llm_dataconn_v3.mp4[width=640] + +== Create Data Connection + +Navigate to the Data Science Project section of the OpenShift AI Console /Dashboard. Select the Ollama-model project. + +. Select the Data Connection menu, followed by create data connection +. Provide the following values: +.. Name: *models* +.. Access Key: use the minio_root-user from YAML file +.. Secret Key: use the minio_root_password from the YAML File +.. Endpoint: use the Minio API URL from the Routes page in Openshift Dashboard +.. Region: This is required for AWS storage & cannot be blank (no-region-minio) +.. Bucket: use the Minio Storage bucket name: *models* + +image::dataconnection_models.png[width=800] + +Repeat the same process for the Storage bucket, using *storage* for the name & bucket. + +== Creating a WorkBench + +video::openshiftai_setup_part3.mp4[width=640] + +Navigate to the Data Science Project section of the OpenShift AI Console /Dashboard. Select the Ollama-model project. + +image::create_workbench.png[width=800] + + . Select the WorkBench button, then click create workbench + + .. Name: `ollama-model` + + .. Notebook Image: `Minimal Python` + + .. Leave the remianing options default. + + .. Optionally, scroll to the bottom, check the `Use data connection box`. + + .. Select *storage* from the dropdown to attach the storage bucket to the workbench. + + . Select the Create Workbench option. + +[NOTE] +Depending on the notebook image selected, it can take between 2-20 minutes for the container image to be fully deployed. The Open Link will be available when our container is fully deployed. + + +== Creating The Model Server + +From the ollama-model WorkBench Dashboard in the ollama-model project, navigate to the **Models** section, and select Deploy Model from the **Single Model Serving Platform Button**. + +image::deploy_model_2.png[width=800] + +*Create the model server with the following values:* + + + .. Model name: `Ollama-Mistral` + .. Serving Runtime: `Ollama` + .. Model framework: `Any` + .. Model Server Size: `Medium` + .. Model location data connection: `models` + .. Model location path: `/ollama` + + +After clicking the **Deploy** button at the bottom of the form, the model is added to our **Models & Model Server list**. When the model is available, the inference endpoint will populate & the status will indicate a green checkmark. + +We are now ready to interact with our newly deployed LLM Model. Join me in Section 2 to explore Mistral running on OpenShift AI using Jupyter Notebooks. + + diff --git a/modules/chapter3/pages/section4.adoc b/modules/chapter3/pages/section4.adoc new file mode 100644 index 0000000..524c530 --- /dev/null +++ b/modules/chapter3/pages/section4.adoc @@ -0,0 +1,141 @@ += Jupyter Notebooks & Mistral LLM Model Setup + +video::openshiftai_notebook.mp4[width=640] + +== Open the Jupyter Notebook + +From the OpenShift AI ollama-model workbench dashboard: + +* Select the Open link to the right of the status section. When the new window opens, use the OpenShift admin user & password to login to the Notebook. + +* Click *Allow selected permissions* button to complete login to the notebook. + +[NOTE] +If the *OPEN* link for the notebook is grayed out, the notebook container is still starting. This process can take a few minutes & up to 20+ minutes depending on the notebook image we opted to choose. + + +== Inside the Jupyter Notebook + +Clone the notebook file to interact with the Ollama Framework from this location: https://github.com/rh-aiservices-bu/llm-on-openshift.git + +Navigate to the llm-on-openshift/examples/notebooks/langchain folder: + +Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_ + +Explore the notebook, and then continue. + +=== Update the Inference Endpoint + +Head back to the RHOAI workbench dashboard & copy the interence endpoint from our ollama-mistral model. +// Should it be inference instead of interence? + +Return the Jupyter Notebook Environment: + + . Paste the inference endpoint into the Cell labeled interfence_server_url = *"replace with your own inference address"* + +image::serverurl.png[width=800] + + . We can now start executing the code in the cells, starting with the set the inference server URL cell. + + . Next we run the second cell: !pip install -q langchain==0.1.14 ; there is a notice to update pip, just ignore and continue. + + . The third cell imports the langchain components that provide the libraries and programming files to interact with our LLM model. + + . In the fourth cell, place our first call to the Ollama-Mistral Framework Served by OpenShift AI. + +[WARNING] +Before we continue, we need to perform the following additional step. As mentioned, The Ollama Model Runtime we launched in OpenShift AI is a Framework that can host multiple LLM Models. It is currently running but is waiting for the command to instruct it to download Model to Serve. The following command needs to run from the OpenShift Dashboard. We are going to use the web_terminal operator to perform this next step. + +== Activating the Mistral Model in Ollama + +We will need to obtain the endpoint from the OpenShift AI model serving console. I usually just paste the text below into a cell in the Jupyter Notebook and paste the url in the code block from there. + +image::mistral_config.png[width=640] + +[source, yaml] +---- +curl https://your-endpoint/api/pull \ + -k \ + -H "Content-Type: application/json" \ + -d '{"name": "mistral"}' +---- + + . Next copy the entire code snippet, and open the OpenShift Dashboard. + . At the top right of the dashboard, locate the ">_" and select it. + . This will open the terminal window at the bottom of the dashboard. + . Click on the Start button in the terminal window, wait for the bash..$ prompt to appear + . Past the modified code block into the window and press enter. + +The message: *status: pulling manifest* should appear. This begins the model downloading process. + +image::curl_command.png[width=800] + +Once the download completes, the *status: success:* message appears. We can now return to the Jupyter Notebook Tab in the browser and proceed. + +=== Create the Prompt + +This cell sets the *system message* portion of the query to our model. Normally, we don't get the see this part of the query. This message details how the model should act, respond, and consider our questions. It adds checks to valdiate the information is best as possible, and to explain answers in detail. + +== Memory for the conversation + +This cell keeps track of the conversation, this way history of the chat are also sent along with new chat information, keeping the context for future questions. + +The next cell tracks the conversation and prints it to the Notebook output window so we can experience the full conversation list. + +=== First input to our LLM + +The Notebooks first input to our model askes it to describe Paris in 100 words or less. + +In green text is the window, there is the setup message that is sent along with the single sentence question to desctibe to the model how to consider and respond to the question. + +It takes approximately 12 seconds for the model to respond with the first word of the reply, and the final word is printed to the screen approximately 30 seconds after the request was started. + +image::paris.png[width=800] + +The responce answered the question in a well-considered and informated paragraph that is less than 100 words in length. + +=== Second Input + +Notice that the Second input - "Is there a River" - does not specify where the location is that might have a River. Because the conversation history is passed with the second input, there is not need to specify any additional informaiton. + +image::london.png[width=800] + +The total time to first word took approximately 14 seconds this time, just a bit longer due the orginal information being sent. The time for the entire reponse to be printed to the screen just took over 4 seoncds. + +Overall our Model is performing well without a GPU and in a container limited to 4 cpus & 10Gb of memory. + +== Second Example Prompt + +Similar to the previous example, except we use the City of London, and run a cell to remove the verbose text reguarding what is sent or recieved apart from the answer from the model. + +There is no change to memory setting, but go ahead and evalute where the second input; "Is there a river?" is answer correctly. + +== Experimentation with Model + +Add a few new cells to the Notebook. + +image::experiment.png[width=800] + +Experiment with clearing the memory statement, then asking the river question again. Or perhaps copy one of the input statements and add your own question for the model. + +Try not clearing the memory and asking a few questions. + +**You have successfully deployed a Large Language Model, now test the information that it has available and find out what is doesn't know.** + + +== Delete the Environment + +Once you have finished experimenting with questions, make sure you head back to the Red Hat Demo Platform and delete the Openshift Container Platform Cluster. + +You don't have to remove any of the resources; deleting the environment will remove any resources created during this lesson. + +=== Leave Feedback + +If you enjoyed this walkthrough, please send the team a note. +If you have suggestions to make it better or clarify a point, please send the team a note. + +Until next time, Keep being Awesome! + + + + diff --git a/modules/chapter2/images/openshiftai_tls.mp4 b/modules/chapter4/images/llm_jupyter_v3.mp4 similarity index 82% rename from modules/chapter2/images/openshiftai_tls.mp4 rename to modules/chapter4/images/llm_jupyter_v3.mp4 index 91ce9b8..e5071c9 100644 Binary files a/modules/chapter2/images/openshiftai_tls.mp4 and b/modules/chapter4/images/llm_jupyter_v3.mp4 differ diff --git a/modules/chapter3/images/openshiftai_setup_part2.mp4 b/modules/chapter4/images/llm_model_v3.mp4 similarity index 80% rename from modules/chapter3/images/openshiftai_setup_part2.mp4 rename to modules/chapter4/images/llm_model_v3.mp4 index ed5ad05..61a4beb 100644 Binary files a/modules/chapter3/images/openshiftai_setup_part2.mp4 and b/modules/chapter4/images/llm_model_v3.mp4 differ diff --git a/modules/chapter3/images/openshiftai_notebook.mp4 b/modules/chapter4/images/openshiftai_notebook.mp4 similarity index 100% rename from modules/chapter3/images/openshiftai_notebook.mp4 rename to modules/chapter4/images/openshiftai_notebook.mp4 diff --git a/modules/chapter4/index.adoc b/modules/chapter4/index.adoc deleted file mode 100644 index c78097e..0000000 --- a/modules/chapter4/index.adoc +++ /dev/null @@ -1,22 +0,0 @@ -= Chapter 1 - -== Introduction - -Modern LLMs can understand and utilize language in a way that has been historically unfathomable to expect from a personal computer. These machine learning models can generate text, summarize content, translate, rewrite, classify, categorize, analyze, and more. All of these abilities provide humans with a powerful toolset to au - -In this course, you will learn how to leverage Red Hat OpenShift AI to serve a Large Language Model. - -How do we deliver a model to an inference engine, or server, so that, when the server receives a request from any of the applications in the organization's portfolio, the inference engine can reply with a prediction that increases the speed, efficiency and effective of business problem solving. - -Machine learning models must be deployed in a production environment to process real-time data and handle the problem they were designed to solve. - -In this lab, we going to deploy the Ollama Model framework which operates a bit differently than a standard machine learning model. Using the Ollama runtime, we can load multiple different models once the runtime is deployed. These models have been quantitized so they do not require a GPU. This makes this runtime engine flexible to accomodate the evaluation of mutliple model types. - -WHy is this important, because business aren't implementing a model just for the cool factor. They are looking to solve a business problem. However, they often won't know what model will work best to solve that problem, many are still in the experimental phase. This makes the Ollama framework perfect to evaluate multiple models without needing to reinvent the wheel. - -While this course touches each of the following bullets in the 5 Steps to building an LLM Application graphic, we will primarily focus on the second step, selecting an LLM. Exploring the Ollama Model Runtime. - -Ollama is a relatively new but powerful framework designed for serving machine learning models. It's designed to be efficient, scalable, and easy to use, making it an attractive option for developers and organizations looking to deploy their AI models into production - - -image::redhatllm.gif[] \ No newline at end of file diff --git a/modules/chapter4/nav.adoc b/modules/chapter4/nav.adoc index 62b8058..d6da93d 100644 --- a/modules/chapter4/nav.adoc +++ b/modules/chapter4/nav.adoc @@ -1,2 +1,3 @@ * xref:index.adoc[] -** xref:section1.adoc[] \ No newline at end of file +** xref:section1.adoc[] +** xref:section2.adoc[] \ No newline at end of file diff --git a/modules/chapter4/pages/index.adoc b/modules/chapter4/pages/index.adoc index 785a3fe..c44bffa 100644 --- a/modules/chapter4/pages/index.adoc +++ b/modules/chapter4/pages/index.adoc @@ -1,3 +1,7 @@ -= Chapter 3 += Jupyter Notebooks & Large Language Model Inference -This is the home page of _Chapter 3_ in the *hello* quick course.... \ No newline at end of file +This chapter begins with running and configured OpenShift AI environment. If you don't already have your environment running, head over to Chapter 2. + +In section, we will explore using the Jupyter Notebook from our workbench to infer data from the Mistral Large Language Model. While less technical than previous sections of this hands-on course, there are some steps to download the Mistral Model, update our notebook with an inference endpoint, and evaluate our models performance. + +Let's get started! \ No newline at end of file diff --git a/modules/chapter4/pages/section1.adoc b/modules/chapter4/pages/section1.adoc new file mode 100644 index 0000000..68d87b2 --- /dev/null +++ b/modules/chapter4/pages/section1.adoc @@ -0,0 +1,73 @@ += Jupyter Notebooks + +video::llm_jupyter_v3.mp4[width=640] + +== Open the Jupyter Notebook + +From the OpenShift AI ollama-model workbench dashboard: + +* Select the Open link to the right of the status section. When the new window opens, use the OpenShift admin user & password to login to the Notebook. + +* Click *Allow selected permissions* button to complete login to the notebook. + +[NOTE] +If the *OPEN* link for the notebook is grayed out, the notebook container is still starting. This process can take a few minutes & up to 20+ minutes depending on the notebook image we opted to choose. + + +== Inside the Jupyter Notebook + +Clone the notebook file to interact with the Ollama Framework from this location: https://github.com/rh-aiservices-bu/llm-on-openshift.git + +Navigate to the llm-on-openshift/examples/notebooks/langchain folder: + +Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_ + +Explore the notebook, and then continue. + +=== Update the Inference Endpoint + +Head back to the RHOAI workbench dashboard & copy the interence endpoint from our ollama-mistral model. +// Should it be inference instead of interence? + +Return the Jupyter Notebook Environment: + + . Paste the inference endpoint into the Cell labeled interfence_server_url = *"replace with your own inference address"* + +image::serverurl.png[width=800] + + . We can now start executing the code in the cells, starting with the set the inference server URL cell. + + . Next we run the second cell: !pip install -q langchain==0.1.14 ; there is a notice to update pip, just ignore and continue. + + . The third cell imports the langchain components that provide the libraries and programming files to interact with our LLM model. + + . In the fourth cell, place our first call to the Ollama-Mistral Framework Served by OpenShift AI. + +[WARNING] +Before we continue, we need to perform the following additional step. As mentioned, The Ollama Model Runtime we launched in OpenShift AI is a Framework that can host multiple LLM Models. It is currently running but is waiting for the command to instruct it to download Model to Serve. The following command needs to run from the OpenShift Dashboard. We are going to use the web_terminal operator to perform this next step. + +== Activating the Mistral Model in Ollama + +We will need to obtain the endpoint from the OpenShift AI model serving console. I usually just paste the text below into a cell in the Jupyter Notebook and paste the url in the code block from there. + +image::mistral_config.png[width=640] + +[source, yaml] +---- +curl https://your-endpoint/api/pull \ + -k \ + -H "Content-Type: application/json" \ + -d '{"name": "mistral"}' +---- + + . Next copy the entire code snippet, and open the OpenShift Dashboard. + . At the top right of the dashboard, locate the ">_" and select it. + . This will open the terminal window at the bottom of the dashboard. + . Click on the Start button in the terminal window, wait for the bash..$ prompt to appear + . Past the modified code block into the window and press enter. + +The message: *status: pulling manifest* should appear. This begins the model downloading process. + +image::curl_command.png[width=800] + +Once the download completes, the *status: success:* message appears. We can now return to the Jupyter Notebook Tab in the browser and proceed. \ No newline at end of file diff --git a/modules/chapter4/pages/section2.adoc b/modules/chapter4/pages/section2.adoc index 1c5adef..6c0f5c5 100644 --- a/modules/chapter4/pages/section2.adoc +++ b/modules/chapter4/pages/section2.adoc @@ -1,29 +1,67 @@ -= refer only += Mistral LLM Model Inference -*Red{nbsp}Hat OpenShift AI* is available as an operator via the OpenShift Operator Hub. You will install the *Red{nbsp}Hat OpenShift AI operator* and dependencies using the OpenShift web console in this section. +video::llm_model_v3.mp4[width=640] -== Lab: Installation of Red{nbsp}Hat OpenShift AI +=== Create the Prompt -IMPORTANT: The installation requires a user with the _cluster-admin_ role +This cell sets the *system message* portion of the query to our model. Normally, we don't get the see this part of the query. This message details how the model should act, respond, and consider our questions. It adds checks to valdiate the information is best as possible, and to explain answers in detail. -. Login to the Red Hat OpenShift using a user which has the _cluster-admin_ role assigned. +== Memory for the conversation -. Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. Click on the button or tile for each. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. For this lab you can skip the installation of the optional operators +This cell keeps track of the conversation, this way history of the chat are also sent along with new chat information, keeping the context for future questions. - * Web Terminal +The next cell tracks the conversation and prints it to the Notebook output window so we can experience the full conversation list. - * Red Hat OpenShift Serverless +=== First input to our LLM - * Red Hat OpenShift Service Mesh +The Notebooks first input to our model askes it to describe Paris in 100 words or less. - * Red Hat OpenShift Pipelines +In green text is the window, there is the setup message that is sent along with the single sentence question to desctibe to the model how to consider and respond to the question. - * GPU Support +It takes approximately 12 seconds for the model to respond with the first word of the reply, and the final word is printed to the screen approximately 30 seconds after the request was started. - ** Node Feature Discovery Operator (optional) +image::paris.png[width=800] - ** NVIDIA GPU Operator (optional) +The responce answered the question in a well-considered and informated paragraph that is less than 100 words in length. -[TIP] - - Installing these Operators prior to the installation of OpenShift AI in my experience has made a difference in OpenShift AI acknowledging the availability of these components and adjusting the initial configuration to shift management of these components to OpenShift AI. \ No newline at end of file +=== Second Input + +Notice that the Second input - "Is there a River" - does not specify where the location is that might have a River. Because the conversation history is passed with the second input, there is not need to specify any additional informaiton. + +image::london.png[width=800] + +The total time to first word took approximately 14 seconds this time, just a bit longer due the orginal information being sent. The time for the entire reponse to be printed to the screen just took over 4 seoncds. + +Overall our Model is performing well without a GPU and in a container limited to 4 cpus & 10Gb of memory. + +== Second Example Prompt + +Similar to the previous example, except we use the City of London, and run a cell to remove the verbose text reguarding what is sent or recieved apart from the answer from the model. + +There is no change to memory setting, but go ahead and evalute where the second input; "Is there a river?" is answer correctly. + +== Experimentation with Model + +Add a few new cells to the Notebook. + +image::experiment.png[width=800] + +Experiment with clearing the memory statement, then asking the river question again. Or perhaps copy one of the input statements and add your own question for the model. + +Try not clearing the memory and asking a few questions. + +**You have successfully deployed a Large Language Model, now test the information that it has available and find out what is doesn't know.** + + +== Delete the Environment + +Once you have finished experimenting with questions, make sure you head back to the Red Hat Demo Platform and delete the Openshift Container Platform Cluster. + +You don't have to remove any of the resources; deleting the environment will remove any resources created during this lesson. + +=== Leave Feedback + +If you enjoyed this walkthrough, please send the team a note. +If you have suggestions to make it better or clarify a point, please send the team a note. + +Until next time, Keep being Awesome! \ No newline at end of file diff --git a/modules/chapter4/pages/section4.adoc b/modules/chapter4/pages/section4.adoc deleted file mode 100644 index 24c9ab2..0000000 --- a/modules/chapter4/pages/section4.adoc +++ /dev/null @@ -1,240 +0,0 @@ -= Prepare MinIO & Data Connections - -https://min.io[MinIO] is a high-performance, S3 compatible object store. It is built for large scale AI/ML, data lake and database workloads. It is software-defined and runs on any cloud or on-premises infrastructure. - -We will need an S3 solution to share the model from training to deploy, in this exercise we will prepare MinIO to be such S3 solution. - -. In OpenShift, create a new namespace with the name **object-datastore**. -+ -[source,console] ----- -$ oc new-project object-datastore ----- - -. Run the following yaml to install MinIO: -+ -[source,console] ----- -$ oc apply -f https://raw.githubusercontent.com/RedHatQuickCourses/rhods-qc-apps/main/4.rhods-deploy/chapter2/minio.yml -n object-datastore ----- - -. Get the route to the MinIO dashboard. -+ -[source,console] ----- -$ oc get routes -n object-datastore | grep minio-ui | awk '{print $2}' ----- -+ -[INFO] -==== -Use this route to navigate to the S3 dashboard using a browser. With the browser, you will be able to create buckets, upload files, and navigate the S3 contents. -==== - -. Get the route to the MinIO API. -+ -[source,console] ----- -$ oc get routes -n object-datastore | grep minio-api | awk '{print $2}' ----- -+ -[INFO] -==== -Use this route as the S3 API endpoint. Basically, this is the URL that we will use when creating a data connection to the S3 in RHOAI. -==== - -[IMPORTANT] -==== -Make sure to create a new path in your bucket, and upload to such path, not to root. Later, when requesting to deploy a model to the **Model Server**, you will be required to provide a path inside your bucket. -==== - -== Create A Data Connection - -. In the RHOAI dashboard, create a project named **iris-project**. - -. In the **Data Connections** section, create a Data Connection to your S3. -+ -image::add-minio-iris-data-connection.png[Add iris data connection from minio] -+ -[IMPORTANT] -==== -- The credentials (Access Key/Secret Key) are `minio`/`minio123`. -- Make sure to use the API route, not the UI route (`oc get routes -n object-datastore | grep minio-api | awk '{print $2}'`). -- The region is not important when using MinIO, this is a property that has effects when using AWS S3. -However, you must enter a non-empty value to prevent problems with model serving. -- Mind typos for the bucket name. -- You don't have to select a workbench to attach this data connection to. -==== - -== Create a Model Server - -. In the **Models and model servers** section, add a server. -+ -image::add-server-button.png[add server] - -. Fill the form with the following values: -+ --- -* Server name: `iris-model-server`. -* Serving runtime: `OpenVINO Model Server`. -* Select the checkboxes to expose the models through an external route, and to enable token authentication. -Enter `iris-serviceaccount` as the service account name. --- -+ -image::add-server-form-example.png[Add Server Form] -+ -[IMPORTANT] -==== -The model server you are creating works as a template for deploying models. As you can see, we have not specified the model that we will deploy, or the data connection from where that model will be retrieved, in this form we are specifying the resources, constraints, and engine that will define the engine where the model will be deployed later. -It is important to pay special attention to the following characteristics: - -- **Serving Runtime**: By default we have _OpenVINO_ and _OpenVINO with GPU_. The important aspects when defining these runtimes are: The framework that is capable of reading models in a given format, and weather such platform supports using GPUs. The use of GPUs allow for complex and lengthy computations to be delivered faster, as there are huge models that require a good amount of power to calculate, based on the given parameters a prediction. - -- **Number of replicas to deploy**: Planning for expected performance and number of expected requests is essential for this part of the form. Here we select if we will load balance a given request between multiple container replicas. - -- **Model Server Size**: In this part of the form we define the resources assigned to each model server container. You can create and select a pre-defined size from the dropdown, or you can select _custom_, in which case, new fields will be displayed to request the processing and memory power to be assigned to your containers. -+ -image::model-server-size.png[model server size] - -- **Model Route**: There are models that can be consumed only from other containers inside the same OpenShift cluster, here we have the ability to not make this server available to entities outside our cluster, or to instruct the model server configuration to assign an external route. When we don't expose the model externally through a route, click on the Internal Service link in the Inference endpoint section: -+ -image::figure14_0.png[Inference endpoint] -+ -A popup will display the address for the gRPC and the REST URLs: -+ -image::figure15_0.png[Endpoint URLs] - -- **Token authorization**: In this part of the form we have a helper checkmark to add authorization to a service account that will be created with access to our model server. Only API requests that present a token that has access to the given service account will be able to run the inference service. -==== - -. After clicking the **Add** button at the bottom of the form, you will be able to see a new **Model Server** configuration in your project, you can click the **Tokens** column, which will make visible the tokens that you can share with the applications that will consume the inference API. -+ -image::model-server-with-token.png[Model Server with token] - -== Deploy The Model - -. At the right side of the **Model Server**, we can find the **Deploy Model** button, let's click the **Deploy Model** button, to start filling the **Deploy Model** form: -+ -image::deploy-model-button.png[Deploy Model button] - -. Fill the **Deploy Model** form. -+ --- -* Model name: `Ollama-Mistral` -* Serving Runtime: `Ollama` -* Model framework: `Any` -* Model Server Size: `Medium` -* Model location data connection: `models` -* Model location path: `/ollama` --- -+ -image::deploy-model-form.png[Deploy Model form] - -. After clicking the **Add** button at the bottom of the form, you will be able to see a new entry at the **Deployed models** column for your **Model Server**, clicking in the column will eventually show a check mark under the **Status** column: -+ -image::deploy-model-success.png[Deploy model success] - -. Observe and monitor the assets created in your OpenShift **iris-project** namespace. -+ -[source,console] ----- -$ oc get routes -n iris-project -$ oc get secrets -n iris-project | grep iris-model -$ oc get events -n iris-project ----- -+ -image::iris-project-events.png[Iris project events] -+ -[TIP] -==== -Deploying a **Model Server** triggers a **ReplicaSet** with **ModelMesh**, which attach your model to the inference runtime, and exposes it through a route. Also, notice the creation of a secret with your token. -==== - -== Test The Model - -Now that the model is ready to use, we can make an inference using the REST API - -. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands. -+ -[source,console] ----- -$ export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-model | awk '{print $2}') ----- - -. Assign an authentication token to an environment variable in your local machine. -+ -[source,console] ----- -$ export TOKEN=$(oc whoami -t) ----- - -. Request an inference with the REST API. -+ -[source,console] ----- -$ curl -H "Authorization: Bearer $TOKEN" $IRIS_ROUTE/v2/models/iris-model/infer \ - -X POST \ - --data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}],"outputs" : [{"name" : "output0"}]}' ----- - -The result of using the inference service looks like the following output: -```json -{"model_name":"iris-model__isvc-590b5324f9","model_version":"1","outputs":[{"name":"label","datatype":"INT64","shape":[1],"data":[1]},{"name":"scores","datatype":"FP32","shape":[1,3],"data":[4.851966,3.1275764,3.4580243]}]} -``` - -=== Model Serving Request Body - -As you tested with the preceding `curl` command, to make HTTP requests to a deployed model you must use a specific request body format. -The basic format of the input data is as follows: - -[subs=+quotes] ----- -{ - "inputs": [{ - "name" : "input", <1> - "shape" : [2,3], <2> - "datatype" : "INT64", <3> - "data" : [[34, 54, 65], [4, 12, 21]] <4> - }] -} ----- -<1> The name of the input tensor. -The data scientist that creates the model must provide you with this value. -<2> The shape of the input tensor. -<3> The https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#tensor-data-types[data type] of the input tensor. -<4> The tensor contents provided as a JSON array. - -The API supports additional parameters. -For a complete list, refer to the https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#inference-request-json-object[Kserve Predict Protocol docs]. - -To make a request in Python, you can use the `requests` library, as the following example shows: - -[source,python] ----- -import requests - -input_data = [-0.15384616, -0.9909186] - -# You must adjust this path or read it from an environment variable -INFERENCE_ENDPOINT = "https://my-model.apps.my-cluster.example.com/v2/models/my-model/infer" - -# Build the request body -payload = { - "inputs": [ - { - "name": "dense_input", - "shape": [1, 2], - "datatype": "FP32", - "data": input_data - } - ] -} - -# Send the POST request -response = requests.post(INFERENCE_ENDPOINT, json=payload) - -# Parse the JSON response -result = response.json() - -# Print predicted values -print(result['outputs'][0]['data']) ----- diff --git a/modules/chapter4/section1.adoc b/modules/chapter4/section1.adoc deleted file mode 100644 index d4c15bc..0000000 --- a/modules/chapter4/section1.adoc +++ /dev/null @@ -1,50 +0,0 @@ -= Model Serving - -Why Ollama - It's unique value is that it makes installing and running LLMs very simple, even for non-technical users. Reduces the resources requirement for many models by >50%, and also the dependency of a GPU with excellent performance in my opinion. - - -== Learn by Doing - -In this quickcourse, there is one goal. Deploy an LLM in OpenShift AI, then utilize Jupyter Notebooks to query said LLM. - -Along the way, we'll discuss personas or roles that would perform these at our Customers. - -For example, you don't get to start with an OpenShift AI Platform, instead you will start with an OpenShift Container Cluster. In the Section Two of this course, together we will tackle the challenges of upgrading our OCP Cluster to host our OpenShift AI Platform. - -Why should this matter to you, it will provided an solid overview of the components needed, will allow you to explain the difficulty level of installing OpenShift AI, and will give you the experience to better understand what value each component adds to the Mix. - -There are several times that tasks must be performed in OCP related to the operations of the OPenShift AI Platform, so it's best to be familar with the both Dashboards. - -While the actuall installation can performed in a few minutes, there is an advanced setup that we need to perform to solve an issue with Cluster Security Certificates. Most organizations will run their own TLS certifcates, rather than use the self-generated certificates in the cluster. - - -.... - The reason we need to perform this porition is that the Original OpenShift Container CLuster created Self Signed Certificates upon deployment. When we install OpenShift AI, it will also create a set of certificates for deploying resources. Well, when we expose resources externally, they use the OCP cluster certificates, which causes a mismatch when we try to connect remotely. So instead of having two different sets of certificates, we are going to use the OCP cluster certificates for the OpenShift AI cluster, simipling connecting to the running model. -.... - - - -1. Once we complete the OpenShift AI setup, which should take about 15-20 minutes, the next step is to Launch OpenShift AI. - -We will then add the Ollama Model Serving Runtime .Yaml file as an additional Single Model Serving Option. - -1. Moving onto the next step, we will create our first Data Science Project in the OpenShift AI platform. This will provide an isolated workspace for our resources. - -For Step 4 we need external storage, or remotely accessible storage via an API in order to retrive the LLM model file. We will use MinIO for this purpose. We will deploy another .YAML file in the Data Science Project we just created. - -Next we will create storage buckets, update the file needed by the Ollama model to our new bucket in a sub-folder - -Once that is complete we head back to our Project, and create a new workbench, will deploy or UI interface, which will be a jupyter notebook. - -Once that is complete, we can finally launch our Ollma Single Model Server. - -Then we will need to configuire the model to be hosted by our Ollama framework which will be Mistral 7B. - -Once that is complete, we can add git repository to the Jupyter notebook and interact with our model using the LangChain library. - -The last step is for you to interact with your new LLM model, hosted in OpenShift AI. You can query the model with your questions, to determine how good it is. - -Then if you up to the Challenge - Delete the model, and redeploy the Ollama fRamework and deploy a different model, perhaps Llama2, or Lava and compare the performance of the different models. You'll be on your own for this part, but I know you got this! - - -