Skip to content

Commit

Permalink
Tech Review Updates part 1
Browse files Browse the repository at this point in the history
Implemented several fixes based on technical review suggestions.
  • Loading branch information
kknoxrht committed Jun 20, 2024
1 parent 9d6a891 commit d616605
Show file tree
Hide file tree
Showing 35 changed files with 680 additions and 865 deletions.
2 changes: 1 addition & 1 deletion antora-playbook.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
site:
title: Serving LLM Models on OpenShift AI
title: Serving an LLM using OpenShift AI
start_page: model-serving::index.adoc

content:
Expand Down
5 changes: 3 additions & 2 deletions antora.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
name: model-serving
title: Serving LLM Models on OpenShift AI
version: 1.01
title: Serving an LLM using OpenShift AI
version: 1.10
nav:
- modules/ROOT/nav.adoc
- modules/chapter1/nav.adoc
- modules/chapter2/nav.adoc
- modules/chapter3/nav.adoc
- modules/chapter4/nav.adoc
- modules/appendix/nav.adoc
Binary file added modules/ROOT/images/intro_v5.mp4
Binary file not shown.
6 changes: 3 additions & 3 deletions modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
= Serving LLM Models on OpenShift AI
= Serving an LLM using OpenShift AI
:navtitle: Home


video::intro_v4.mp4[width=640]
video::intro_v5.mp4[width=640]

Welcome to this quick course on _Serving an LLM using OpenShift AI_.

Expand Down Expand Up @@ -47,7 +47,7 @@ When ordering this catalog item in RHDP:

. Select Learning about the Product for the Purpose field

. Enter Learning RHOAI in the Salesforce ID field
. Leave the Salesforce ID field blank

. Scroll to the bottom, and check the box to confirm acceptance of terms and conditions

Expand Down
2 changes: 1 addition & 1 deletion modules/appendix/pages/section1.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ WARNING: Pending review.

// This quiz uses things a learner might know from his previous experience with RHEL or OpenStack as *distractors*, but does NOT rely on any previous knowledge. Learners new to OpenStack and OpenShift should be able to answer all questions from only the contents on the previouis lecture.

1. Which of the following *Operator components* are either are required to enable Red Hat OpenShift AI on OpenShift with Single Model Serving Platform capabilities ?
1. Which of the following *Operator components* are required to enable Red Hat OpenShift AI on OpenShift with Single Model Serving Platform capabilities ?

* [ ] Red Hat OpenShift serverless operator
* [ ] Red Hat OpenShift service mesh operator
Expand Down
3 changes: 2 additions & 1 deletion modules/chapter1/nav.adoc
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
* xref:index.adoc[]
** xref:section1.adoc[]
** xref:section1.adoc[]
** xref:section2.adoc[]
2 changes: 1 addition & 1 deletion modules/chapter1/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Technical side of LLMs
= Technical Component Intoduction


[NOTE]
Expand Down
48 changes: 3 additions & 45 deletions modules/chapter1/pages/section1.adoc
Original file line number Diff line number Diff line change
@@ -1,57 +1,15 @@
= Technology Components
= Red Hat OpenShift AI

== Kubernetes & OpenShift

OpenShift builds upon Kubernetes by providing an enhanced platform with additional capabilities. It simplifies the deployment and management of Kubernetes clusters while adding enterprise features, developer tools, and security enhancements.

In addition, Openshift provides a Graphic User Interface for Kubernetes. Openshift AI runs on Openshift; therefore, the engine under the hood of both products is Kubernetes.

Most workloads are deployed in kubernetes via YAML files. A Kubernetes Deployment YAML file is a configuration file written in YAML (YAML Ain't Markup Language) that defines the desired state of a Kubernetes Deployment. These YAML file are used to create, update, or delete Deployments in Kubernetes / OpenShift clusters.
Most workloads are deployed in kubernetes via YAML files. A Kubernetes YAML manifest file is a configuration file written in YAML (YAML Ain't Markup Language) that defines the desired state of a Kubernetes deployment. These YAML files are used to create, update, or delete deployments in Kubernetes / OpenShift clusters.

Don’t worry about needing to know how to write these files. That's what OpenShift & OpenShift AI will take care of for us. In this course, we will just need to select the options we want in the UI. OpenShift and OpenShift AI will take care of creating the YAML deployment files.

We will have to perform a few YAML file copy-and-paste operations; instructions are provided in the course.

Just know, YAML files create resources directly in the Kubernetes platform. We primarily use the OpenShift AI UI to perform these tasks to deliver our LLM.

== Large Language Models

Large Language Models (LLMs) can generate new stories, summarize texts, and even perform advanced tasks like reasoning and problem solving, which is not only impressive but also remarkable due to their accessibility and easy integration into applications.

As you probably already know, training large language models is expensive, time consuming, and most importantly requires a vast amount of data fed into the model.

The common outcome from this training is a Foundation model: an LLM designed to generate and understand human-like text across a wide range of use cases.

The key to this powerful language processing architecture, *is the Transformer!* A helpful definition of a *Transformer* is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The Transformer was created by Google and started as a language translation algorithm. It analyzes relationships between words in text, which crucial for LLMs to understand and generate language.

This is how LLMs are able to predict the next words, by using the transformer neural network & attention mechanism to focus in on keywords to determine context. Then use that context and _knowledge_ from all the data ingested to predict the next word after a sequence of words.

=== Modifications to LLMs

As mentioned above, LLMs are normally large, require Graphics Cards, and costly compute resources to load the model into memory.

However, there are techniques for compressing large LLM models, making them smaller and faster to run on devices with limited resources.

* Quantization reduces the precision of numerical representations in large language models to make them more memory-efficient during deployment.

* Reducing the precision of LLM parameters to save computational resources without sacrificing performance. Trimming surplus connections or parameters to make LLMs smaller and faster yet performant.

In this course, we will be using a quantized version of the Mistral Large Language Model. Instead of requiring 24Gb of memory and Graphics processing unit to simulate the neural network, we are going to run our model with 4 CPUs and 8GB of ram, burstable to 8 CPU with 10Gb ram max.

[NOTE]
https://www.redhat.com/en/topics/ai/what-is-instructlab[*InstructLabs*]- runs locally on laptops uses this same type of quantized LLMs, Both the Granite & Mixtral Large Language Models are reduced in precision to operate on a laptop.

== The Ollama Model Framework

There are hundreds of popular LLMs, nonetheless, their operation remains the same: users provide instructions or tasks in natural language, and the LLM generates a response based on what the model "thinks" could be the continuation of the prompt.

Ollama is not an LLM Model - Ollama is a relatively new but powerful open-source framework designed for serving machine learning models. It's designed to be efficient, scalable, and easy to use; making it an attractive option for developers and organizations looking to deploy their AI models into production.

=== How does Ollama work?


At its core, Ollama simplifies the process of downloading, installing, and interacting with a wide range of LLMs, empowering users to explore their capabilities without the need for extensive technical expertise or reliance on cloud-based platforms.

In this course, we will focus on single LLM, Mistral, run on the Ollama Framework. However, with the understanding of the Ollama Framework, we will be able to work with a variety of large language models utilizing the exact same configuration.

You will be able to switch models in minutes, all running on the same platform. This will enable you test, compare, and evalute multiple models with the skills gained in the course.
Just know, YAML files create resources directly in the Kubernetes platform. We primarily use the OpenShift AI UI to perform these tasks to deliver our LLM.
41 changes: 40 additions & 1 deletion modules/chapter1/pages/section2.adoc
Original file line number Diff line number Diff line change
@@ -1,2 +1,41 @@
= Follow up Story
= Large Language Models

== Large Language Models

Large Language Models (LLMs) can generate new stories, summarize texts, and even perform advanced tasks like reasoning and problem solving, which is not only impressive but also remarkable due to their accessibility and easy integration into applications.

As you probably already know, training large language models is expensive, time consuming, and most importantly requires a vast amount of data fed into the model.

The common outcome from this training is a Foundation model: an LLM designed to generate and understand natural language text across a wide range of use cases.

The key to this powerful language processing architecture, *is the Transformer!* A helpful definition of a *Transformer* is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The Transformer was created by Google and started as a language translation algorithm. It analyzes relationships between words in text, which crucial for LLMs to understand and generate language.

This is how LLMs are able to predict the next words, by using the transformer neural network & attention mechanism to focus in on keywords to determine context. Then use that context and _knowledge_ from all the data ingested to predict the next word after a sequence of words.

=== Modifications to LLMs

As mentioned above, LLMs are normally large, require GPU or accelerator chips, and costly compute resources to load the model into memory.

However, there are techniques for compressing large LLM models, making them smaller and faster to run on devices with limited resources.

* Quantization reduces the precision of numerical representations in large language models to make them more memory-efficient during deployment.

* Reducing the precision of LLM parameters to save computational resources without sacrificing performance. Trimming surplus connections or parameters to make LLMs smaller and faster yet performant.

In this course, we will be using a quantized version of the Mistral Large Language Model. Instead of requiring 24Gb of memory and graphics processing unit to simulate the neural network, we are going to run our model with 4 CPUs and 8GB of ram, burstable to 8 CPU with 10Gb ram max.


== The Ollama Model Framework

There are hundreds of popular LLMs, nonetheless, their operation remains the same: users provide instructions or tasks in natural language, and the LLM generates a response based on what the model "thinks" could be the continuation of the prompt.

Ollama is not an LLM Model - Ollama is a relatively new but powerful open-source framework designed for serving machine learning models. It's designed to be efficient, scalable, and easy to use; making it an attractive option for developers and organizations looking to deploy their AI models into production.

=== How does Ollama work?


At its core, Ollama simplifies the process of downloading, installing, and interacting with a wide range of LLMs, empowering users to explore their capabilities without the need for extensive technical expertise or reliance on cloud-based platforms.

In this course, we will focus on single LLM, Mistral, run on the Ollama Framework. However, with the understanding of the Ollama Framework, we will be able to work with a variety of large language models utilizing the exact same configuration.

You will be able to switch models in minutes, all running on the same platform. This will enable you test, compare, and evalute multiple models with the skills gained in the course.
Binary file added modules/chapter2/images/llm_dsc_v3.mp4
Binary file not shown.
Binary file added modules/chapter2/images/llm_tls_v3.mp4
Binary file not shown.
3 changes: 2 additions & 1 deletion modules/chapter2/nav.adoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
* xref:index.adoc[]
** xref:section1.adoc[]
** xref:section2.adoc[]
** xref:section2.adoc[]
** xref:section3.adoc[]
2 changes: 1 addition & 1 deletion modules/chapter2/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
== Supported configurations
OpenShift AI is supported in two configurations:

* A managed cloud service add-on for *Red Hat OpenShift Dedicated* (with a Customer Cloud Subscription for AWS or GCP) or for Red Hat OpenShift Service on Amazon Web Services (ROSA).
* A managed cloud service add-on for *Red Hat OpenShift Service on Amazon Web Services* (ROSA, with a Customer Cloud Subscription for AWS) or *Red Hat OpenShift Dedicated* (GCP).
For information about OpenShift AI on a Red Hat managed environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_cloud_service/1[Product Documentation for Red Hat OpenShift AI Cloud Service 1].

* Self-managed software that you can install on-premise or on the public cloud in a self-managed environment, such as *OpenShift Container Platform*.
Expand Down
4 changes: 3 additions & 1 deletion modules/chapter2/pages/section1.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ This exercise uses the Red Hat Demo Platform; specifically the OpenShift Contain

. Login to the Red Hat OpenShift using a user which has the _cluster-admin_ role assigned.

. It’s sufficient to install all prerequisite operators with default settings, no additional configuration is necessary.

. Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. Click on the button or tile for each. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. For this lab you can skip the installation of the optional operators.

[*] You do not have to wait for the previous Operator to complete before installing the next. For this lab you can skip the installation of the optional operators as there is no GPU.
Expand Down Expand Up @@ -39,7 +41,7 @@ This exercise uses the Red Hat Demo Platform; specifically the OpenShift Contain

image::openshiftai_operator.png[width=640]

. Click on the `Red{nbsp}Hat OpenShift AI` operator. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view.
. Click on the `Red{nbsp}Hat OpenShift AI` operator. In the pop up window that opens, ensure you select the latest version in the *fast* channel. Any version greater than 2.91 and click on **Install** to open the operator's installation view.
+

. In the `Install Operator` page, leave all of the options as default and click on the *Install* button to start the installation.
Expand Down
Loading

0 comments on commit d616605

Please sign in to comment.