diff --git a/antora.yml b/antora.yml index b455aa8..519abae 100644 --- a/antora.yml +++ b/antora.yml @@ -6,5 +6,4 @@ nav: - modules/chapter1/nav.adoc - modules/chapter2/nav.adoc - modules/chapter3/nav.adoc -- modules/chapter4/nav.adoc -- modules/appendix/nav.adoc \ No newline at end of file +- modules/chapter4/nav.adoc \ No newline at end of file diff --git a/modules/chapter2/pages/index.adoc b/modules/chapter2/pages/index.adoc index f357d2d..5841184 100644 --- a/modules/chapter2/pages/index.adoc +++ b/modules/chapter2/pages/index.adoc @@ -1,10 +1,10 @@ -= OpenShift AI Initilization += OpenShift AI Initialization == Supported configurations OpenShift AI is supported in two configurations: * A managed cloud service add-on for *Red Hat OpenShift Service on Amazon Web Services* (ROSA, with a Customer Cloud Subscription for AWS) or *Red Hat OpenShift Dedicated* (GCP). -For information about OpenShift AI on a Red Hat managed environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_cloud_service/1[Product Documentation for Red Hat OpenShift AI Cloud Service 1]. +For information about OpenShift AI on a Red Hat managed environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_cloud_service/1[Product Documentation for Red Hat OpenShift AI Cloud Service]. * Self-managed software that you can install on-premise or on the public cloud in a self-managed environment, such as *OpenShift Container Platform*. For information about OpenShift AI as self-managed software on your OpenShift cluster in a connected or a disconnected environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.8[Product Documentation for Red Hat OpenShift AI Self-Managed 2.8]. diff --git a/modules/chapter3/attachments/emptyfile.txt b/modules/chapter3/attachments/emptyfile.txt new file mode 100644 index 0000000..e69de29 diff --git a/modules/chapter3/examples/emptyfile.txt b/modules/chapter3/examples/emptyfile.txt new file mode 100644 index 0000000..e69de29 diff --git a/modules/chapter3/pages/section2.adoc b/modules/chapter3/pages/section2.adoc index 7507aab..b1489ce 100644 --- a/modules/chapter3/pages/section2.adoc +++ b/modules/chapter3/pages/section2.adoc @@ -47,7 +47,7 @@ stringData: # change the username and password to your own values. # ensure that the user is at least 3 characters long and the password at least 8 minio_root_user: minio - minio_root_password: minio123 + minio_root_password: minio321! --- kind: Deployment apiVersion: apps/v1 @@ -203,6 +203,10 @@ From the OCP Dashboard: . This window opens the MinIO Dashboard. Log in with username/password combination you set, or the default listed in yaml file above. + .. username = minio + + .. password = minio321! + Once logged into the MinIO Console: . Click Create Bucket to get started. @@ -214,7 +218,7 @@ Once logged into the MinIO Console: .. *storage* [NOTE] - When serving an LLM or other model, Openshift AI looks within a Folder. Therefore, we need at least one subdirectory under the Models Folder. + When serving an LLM or other model, Openshift AI looks within a folder. Therefore, we need at least one subdirectory under the models folder. . Via the Navigation menu, *select object browser*, then click on the Model Bucket. . From the models bucket page, click add path, and type *ollama* as the name of the sub-folder or path. @@ -224,7 +228,10 @@ In most cases, to serve a model, the trained model would be uploaded into this s . We still need a file available in this folder for the model deployment workflow to succeed. - . So we will copy an *emptyfile.txt* file to the ollama subdirectory. You can download the file from https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main/serving-runtimes/ollama_runtime[*this location*]. Alternatively, you can create your own file called emptyfile.txt and upload it. + . So we will copy an *emptyfile.txt* file to the ollama subdirectory. + + +You can download the file from xref:attachment$emptyfile.txt[this location]. Alternatively, you can create your own file called emptyfile.txt and upload it. . Once you have this file ready, upload it into the Ollama path in the model bucket by clicking the upload button and selecting the file from your local desktop. \ No newline at end of file diff --git a/modules/chapter3/pages/section3.adoc b/modules/chapter3/pages/section3.adoc index e98d2ce..4c2bfb8 100644 --- a/modules/chapter3/pages/section3.adoc +++ b/modules/chapter3/pages/section3.adoc @@ -1,4 +1,4 @@ -= OpeonShift AI Resources - 2 += OpenShift AI Resources - 2 video::llm_dataconn_v3.mp4[width=640] @@ -49,12 +49,10 @@ Depending on the notebook image selected, it can take between 2-20 minutes for t From the ollama-model WorkBench Dashboard in the ollama-model project, navigate to the **Models** section, and select Deploy Model from the **Single Model Serving Platform Button**. -image::deploy_model_2.png[width=800] - *Create the model server with the following values:* - .. Model name: `Ollama-Mistral` + .. Model name: `ollama-mistral` .. Serving Runtime: `Ollama` .. Model framework: `Any` .. Model Server Size: `Medium` diff --git a/modules/chapter4/images/add_a_cell.png b/modules/chapter4/images/add_a_cell.png new file mode 100644 index 0000000..7316060 Binary files /dev/null and b/modules/chapter4/images/add_a_cell.png differ diff --git a/modules/chapter4/images/clone_a_repo.png b/modules/chapter4/images/clone_a_repo.png new file mode 100644 index 0000000..2dcb493 Binary files /dev/null and b/modules/chapter4/images/clone_a_repo.png differ diff --git a/modules/chapter3/images/curl_command.png b/modules/chapter4/images/curl_command.png similarity index 100% rename from modules/chapter3/images/curl_command.png rename to modules/chapter4/images/curl_command.png diff --git a/modules/chapter3/images/experiment.png b/modules/chapter4/images/experiment.png similarity index 100% rename from modules/chapter3/images/experiment.png rename to modules/chapter4/images/experiment.png diff --git a/modules/chapter4/images/inference_endpoint.png b/modules/chapter4/images/inference_endpoint.png new file mode 100644 index 0000000..602ba50 Binary files /dev/null and b/modules/chapter4/images/inference_endpoint.png differ diff --git a/modules/chapter4/images/llama3_url.png b/modules/chapter4/images/llama3_url.png new file mode 100644 index 0000000..57b9472 Binary files /dev/null and b/modules/chapter4/images/llama3_url.png differ diff --git a/modules/chapter4/images/llama_llm.png b/modules/chapter4/images/llama_llm.png new file mode 100644 index 0000000..65c5634 Binary files /dev/null and b/modules/chapter4/images/llama_llm.png differ diff --git a/modules/chapter3/images/london.png b/modules/chapter4/images/london.png similarity index 100% rename from modules/chapter3/images/london.png rename to modules/chapter4/images/london.png diff --git a/modules/chapter3/images/mistral_config.png b/modules/chapter4/images/mistral_config.png similarity index 100% rename from modules/chapter3/images/mistral_config.png rename to modules/chapter4/images/mistral_config.png diff --git a/modules/chapter3/images/paris.png b/modules/chapter4/images/paris.png similarity index 100% rename from modules/chapter3/images/paris.png rename to modules/chapter4/images/paris.png diff --git a/modules/chapter4/images/replaced_endpoints.png b/modules/chapter4/images/replaced_endpoints.png new file mode 100644 index 0000000..2f09d0b Binary files /dev/null and b/modules/chapter4/images/replaced_endpoints.png differ diff --git a/modules/chapter4/images/replaced_endpoints2.png b/modules/chapter4/images/replaced_endpoints2.png new file mode 100644 index 0000000..ddea4c5 Binary files /dev/null and b/modules/chapter4/images/replaced_endpoints2.png differ diff --git a/modules/chapter3/images/serverurl.png b/modules/chapter4/images/serverurl.png similarity index 100% rename from modules/chapter3/images/serverurl.png rename to modules/chapter4/images/serverurl.png diff --git a/modules/chapter4/nav.adoc b/modules/chapter4/nav.adoc index d6da93d..8dcc0c4 100644 --- a/modules/chapter4/nav.adoc +++ b/modules/chapter4/nav.adoc @@ -1,3 +1,4 @@ * xref:index.adoc[] ** xref:section1.adoc[] -** xref:section2.adoc[] \ No newline at end of file +** xref:section2.adoc[] +** xref:section3.adoc[] \ No newline at end of file diff --git a/modules/chapter4/pages/index.adoc b/modules/chapter4/pages/index.adoc index c44bffa..0dd1eb5 100644 --- a/modules/chapter4/pages/index.adoc +++ b/modules/chapter4/pages/index.adoc @@ -1,4 +1,4 @@ -= Jupyter Notebooks & Large Language Model Inference += Jupyter Notebooks & LLMs This chapter begins with running and configured OpenShift AI environment. If you don't already have your environment running, head over to Chapter 2. diff --git a/modules/chapter4/pages/section1.adoc b/modules/chapter4/pages/section1.adoc index 68d87b2..39b9f2c 100644 --- a/modules/chapter4/pages/section1.adoc +++ b/modules/chapter4/pages/section1.adoc @@ -2,11 +2,14 @@ video::llm_jupyter_v3.mp4[width=640] -== Open the Jupyter Notebook +== Open JupyterLab -From the OpenShift AI ollama-model workbench dashboard: +JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. For a demonstration of JupyterLab and its features, https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html#what-will-happen-to-the-classic-notebook[you can view this video.] -* Select the Open link to the right of the status section. When the new window opens, use the OpenShift admin user & password to login to the Notebook. + +Return to the ollama-model workbench dashboard in the OpenShift AI console. + +* Select the *Open* link to the right of the status section. When the new window opens, use the OpenShift admin user & password to login to JupyterLab. * Click *Allow selected permissions* button to complete login to the notebook. @@ -14,60 +17,98 @@ From the OpenShift AI ollama-model workbench dashboard: If the *OPEN* link for the notebook is grayed out, the notebook container is still starting. This process can take a few minutes & up to 20+ minutes depending on the notebook image we opted to choose. -== Inside the Jupyter Notebook - -Clone the notebook file to interact with the Ollama Framework from this location: https://github.com/rh-aiservices-bu/llm-on-openshift.git +== Inside JupyterLab -Navigate to the llm-on-openshift/examples/notebooks/langchain folder: +This takes us to the JupyterLab screen where we can select multiple options / tools / to work to begin our data science experimentation. -Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_ +Our first action is to clone a git repository that contains a collection of LLM projects including the notebook we are going to use to interact with the LLM. -Explore the notebook, and then continue. +Clone the github repository to interact with the Ollama Framework from this location: +https://github.com/rh-aiservices-bu/llm-on-openshift.git -=== Update the Inference Endpoint + . Copy the URL link above -Head back to the RHOAI workbench dashboard & copy the interence endpoint from our ollama-mistral model. -// Should it be inference instead of interence? + . Click on the Clone a Repo Icon above explorer section window. -Return the Jupyter Notebook Environment: +image::clone_a_repo.png[width=640] - . Paste the inference endpoint into the Cell labeled interfence_server_url = *"replace with your own inference address"* + . Paste the link into the *clone a repo* pop up, make sure the *included submodules are checked*, then click the clone. + + . Navigate to the llm-on-openshift/examples/notebooks/langchain folder: -image::serverurl.png[width=800] + . Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_ - . We can now start executing the code in the cells, starting with the set the inference server URL cell. + . Explore the notebook, and then continue. - . Next we run the second cell: !pip install -q langchain==0.1.14 ; there is a notice to update pip, just ignore and continue. +=== Configure the Ollama Framework with a Large Language Model - . The third cell imports the langchain components that provide the libraries and programming files to interact with our LLM model. + . From the Notebook page, add a new cell above the inference url - . In the fourth cell, place our first call to the Ollama-Mistral Framework Served by OpenShift AI. +image::add_a_cell.png[width=640] -[WARNING] -Before we continue, we need to perform the following additional step. As mentioned, The Ollama Model Runtime we launched in OpenShift AI is a Framework that can host multiple LLM Models. It is currently running but is waiting for the command to instruct it to download Model to Serve. The following command needs to run from the OpenShift Dashboard. We are going to use the web_terminal operator to perform this next step. -== Activating the Mistral Model in Ollama +The Ollama Model Runtime we deployed using the Single Model Serving Platform in OpenShift AI is a Framework that can host various large language models. It is currently running, but is waiting for the command to instruct the framework on which model to download and serve. -We will need to obtain the endpoint from the OpenShift AI model serving console. I usually just paste the text below into a cell in the Jupyter Notebook and paste the url in the code block from there. +. To load the mistral model, we are going use the following python code to instruct the runtime to download and serve a quantized 4 bit version of the mistral large language model. -image::mistral_config.png[width=640] +. Copy the code below and paste this code in the new cell added to the notebook in the previous step. + [source, yaml] ---- -curl https://your-endpoint/api/pull \ - -k \ - -H "Content-Type: application/json" \ - -d '{"name": "mistral"}' +import requests + +headers = { + # Already added when you pass json= + # 'Content-Type': 'application/json', +} + +json_data = { + 'name': 'mistral', +} + +response = requests.post('https://your-endpoint/api/pull', headers=headers, json=json_data, verify=False) ---- - . Next copy the entire code snippet, and open the OpenShift Dashboard. - . At the top right of the dashboard, locate the ">_" and select it. - . This will open the terminal window at the bottom of the dashboard. - . Click on the Start button in the terminal window, wait for the bash..$ prompt to appear - . Past the modified code block into the window and press enter. +We'll need to modify the url in the bottom line beginning with *response =* in the next step. + +=== Update the Inference Endpoints + +Head back to the RHOAI ollama-model workbench dashboard, from the models tab, copy the inference endpoint for the ollama-mistral model. + +image::inference_endpoint.png[width=640] + +Return the Jupyter notebook + +We will be updating two cells with the inference endpoint. + + . Replace the https://your-endopint section of the python code we copied into the new cell. Ensure you leave the /api/pull portion appended to the url. + + . Replace the red text inside the quotation marks for the inference_server_url with the same inference endpoint url. + +image::replaced_endpoints2.png[width=640] + +=== Execute cell code to assemble the langchain components + + . We can now start executing the code in the cells, begin with the new cell added to the top. Click on the cell to activate blue indicator to the left of the cell. + + .. You will receive a message about an Unverified HTTPs request. This is because we didn’t use authentication for this application. You can ignore this for this lab experience, but in production we would enable authentication using certificates as suggested. https://developers.redhat.com/articles/2021/06/18/authorino-making-open-source-cloud-native-api-security-simple-and-flexible[To use authentication we need to install the Authorino Operator.] + + .. The mistral model files are now being downloaded to the Ollama Framework. + + . Continue executing through the cells. + + . Next we run the cell: *!pip install -q langchain==0.1.14* ; there is a notice to update pip; ignore and continue. + + . The next cell imports the langchain components that provide the libraries and programming files to interact with our LLM. + + . This *"Create the LLM instance"* cell sets the variables that determine how we are going to interact with our model and how it should respond - sets that into an array using *llm* variable. + + . Next run the *"Create the prompt"* cell. Here we are setting the *template* variable with the details of how the model operate, including constraints and boundries when generating the response. We often to not experience the system message when interacting with an LLM, but this is a standard field that is included along with the user prompt. + + . Continue executing the cells, *"memory for the conversation"* keeps the previous context / conversation history so full history of the chat conversation is sent as part of the prompt. -The message: *status: pulling manifest* should appear. This begins the model downloading process. + . The *create the chain* cell, combines each of previous variables: llm, prompt, memory, and adds a verbose boolean to create the conversation variable, which will be sent to Models inference endpoint running in OpenShift AI. The verbose option set to true displays the entire conversation sent to the Model in the notebook before the Models (AI's) response. -image::curl_command.png[width=800] -Once the download completes, the *status: success:* message appears. We can now return to the Jupyter Notebook Tab in the browser and proceed. \ No newline at end of file +In the next, section, we'll send our first input to the running Mistral Large Language Model. diff --git a/modules/chapter4/pages/section2.adoc b/modules/chapter4/pages/section2.adoc index 6c0f5c5..9873b5b 100644 --- a/modules/chapter4/pages/section2.adoc +++ b/modules/chapter4/pages/section2.adoc @@ -2,66 +2,59 @@ video::llm_model_v3.mp4[width=640] -=== Create the Prompt +== Let's Talk with the LLM -This cell sets the *system message* portion of the query to our model. Normally, we don't get the see this part of the query. This message details how the model should act, respond, and consider our questions. It adds checks to valdiate the information is best as possible, and to explain answers in detail. +=== First Input -== Memory for the conversation +The first input cell sent via the notebook to the Mistral model askes it to describe Paris in 100 words or less. -This cell keeps track of the conversation, this way history of the chat are also sent along with new chat information, keeping the context for future questions. +In green text is the window, there is the setup message that is sent along with the single sentence question to desctibe to the model how to consider and respond to the question. This is known as the system message. Additionally, is the current conversation which contains the Human Question or Prompt sent to Model, and next is the AI answer. -The next cell tracks the conversation and prints it to the Notebook output window so we can experience the full conversation list. - -=== First input to our LLM - -The Notebooks first input to our model askes it to describe Paris in 100 words or less. - -In green text is the window, there is the setup message that is sent along with the single sentence question to desctibe to the model how to consider and respond to the question. - -It takes approximately 12 seconds for the model to respond with the first word of the reply, and the final word is printed to the screen approximately 30 seconds after the request was started. +It takes few seconds for the OpenShift AI model to respond with the first words of the reply. The response answered the question in a well-considered informative paragraph that is less than 100 words in length. image::paris.png[width=800] -The responce answered the question in a well-considered and informated paragraph that is less than 100 words in length. === Second Input -Notice that the Second input - "Is there a River" - does not specify where the location is that might have a River. Because the conversation history is passed with the second input, there is not need to specify any additional informaiton. +Notice that the Second input - "Is there a River" - does not specify where the location is. Due to the conversation history being passed with the second input, there is no need to specify any additional informaiton. -image::london.png[width=800] +== Second Example -The total time to first word took approximately 14 seconds this time, just a bit longer due the orginal information being sent. The time for the entire reponse to be printed to the screen just took over 4 seoncds. +Before we continue with London example, we execute a cell to change the conversation mode to non-verbose. This elimiates the context of the prompt displayed in the notebook to instead just show the model's reply. -Overall our Model is performing well without a GPU and in a container limited to 4 cpus & 10Gb of memory. +We also execute a cell to clear memory, or the conversation history reguarding Paris. -== Second Example Prompt +We did not disable the memory, or the verbosity of the conversation; we simply hid that section from being visible in the notebook. -Similar to the previous example, except we use the City of London, and run a cell to remove the verbose text reguarding what is sent or recieved apart from the answer from the model. +Go ahead run the second exmaple cells and evalute the responses from the Model. -There is no change to memory setting, but go ahead and evalute where the second input; "Is there a river?" is answer correctly. +image::london.png[width=800] == Experimentation with Model -Add a few new cells to the Notebook. +There are multiple different types of large language models, while we can read about them, using them first hand is best way to experience how they perform. -image::experiment.png[width=800] +So now it's time to experiement on your own, or continue to follow along with this guide. -Experiment with clearing the memory statement, then asking the river question again. Or perhaps copy one of the input statements and add your own question for the model. -Try not clearing the memory and asking a few questions. +Add a few new cells to the bottom of the Notebook. + +image::experiment.png[width=800] -**You have successfully deployed a Large Language Model, now test the information that it has available and find out what is doesn't know.** +Experiment by coping the clear memory cell text, paste the contents into one of the new cells. Next copy one of the input statements and add your own question for the model. Then run or execute those cells to learn more about the models capabilities. +I used the following examples: -== Delete the Environment + . Are you an AI model ? + . Tell me a joke please ? -Once you have finished experimenting with questions, make sure you head back to the Red Hat Demo Platform and delete the Openshift Container Platform Cluster. +Then I asked one of my standard questions across models to determine it's knowledge of history: -You don't have to remove any of the resources; deleting the environment will remove any resources created during this lesson. +*Was George Washington Married?* -=== Leave Feedback +Why I ask ths question is several models say GW was married twice. I believed the first one, and this had me thinking several of the next models where wrong. It's critical that we evalute models to determine their viability for business use cases. -If you enjoyed this walkthrough, please send the team a note. -If you have suggestions to make it better or clarify a point, please send the team a note. +Try clearing the memory and asking your own questions. -Until next time, Keep being Awesome! \ No newline at end of file +Continue to experiment with the Mistral model, or move to the next section, where we evaluate a differnet large language model. \ No newline at end of file diff --git a/modules/chapter4/pages/section3.adoc b/modules/chapter4/pages/section3.adoc new file mode 100644 index 0000000..31cee44 --- /dev/null +++ b/modules/chapter4/pages/section3.adoc @@ -0,0 +1,69 @@ += LLama3 LLM Model Inference + +// video::llm_model_v.mp4[width=640] + +Experimentation with various models allows the selection of the best model for the task at hand. + +== Delete existing deployed model + +Return to the OpenShift AI Dashboard and ollama-model workbench + + . Head to the Model section of the workbench + + . To the right of the ollama-mistral model there are three stacked dots, select the dots, then delete from the menu + + . You need to type in the *ollama-mistral* model name to confirm the deletion. + + . No need to wait, continue onto the next section. + +== Creating The Model Server + +From the ollama-model WorkBench Dashboard in the ollama-model project, navigate to the **Models** section, and select Deploy Model from the **Single Model Serving Platform Button**. + +*Create the model server with the following values:* + + + .. Model name: `ollama-llama3` + .. Serving Runtime: `Ollama` + .. Model framework: `Any` + .. Model Server Size: `Medium` + .. Model location data connection: `models` + .. Model location path: `/ollama` + + +After clicking the **Deploy** button at the bottom of the form, the model is added to our **Models & Model Server list**. When the model is available, the inference endpoint will populate & the status will indicate a green checkmark. + +Copy The Interence Endpoint, we need to replace the orginal inference endpoints used in our notebooks top two cells. + +=== Update the Inference Endpoints + + . From the Notebook page, replace the previous inference endpoints with the ollama-llama3 endpoint + + . In the phython code cell, we also need to change the name of the large language model in the json_data section from "mistral" to *"llama3"* + + . To load the llama3 model, we are going use the following python code to instruct the runtime to download and serve a quantized 4 bit version of the llama3 large language model. + + +image::llama3_url.png[width=800] + + +[NOTE] +The Ollama Model Runtime we deployed using the Single Model Serving Platform in OpenShift AI is a Framework that can host various large language models. It is currently running, but is waiting for the command to instruct the framework on which model to download and serve. You can view the available models in https://ollama.com/library[the ollama library here.] + + +=== Execute the cells again + + . We can now start executing the code in the cells, begin from the top at Set inference server cell. Click to left of the cell to activate orange indicator next to cell. Orange indicate the cell has been modified from the original, blue will still highlight for unmodified cells. + + .. You will again receive the message about an unverified HTTPs request. This is because we didn’t use authentication for this application. + + .. The *llama3* model files are now being downloaded to the Ollama Framework. + + . Continue executing through the cells, but stop at the *create the LLM instance cell*. + + . For the *create the LLM instance* cell, we need to change the _model =mistral text_ to *llama3*. + +image::llama_llm.png[width=800] + + +