Skip to content

Commit

Permalink
wrapping up Chapter 3
Browse files Browse the repository at this point in the history
  • Loading branch information
kknoxrht committed Jun 2, 2024
1 parent c305167 commit f329ece
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 28 deletions.
2 changes: 1 addition & 1 deletion modules/chapter2/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ https://docs.openshift.com/container-platform/latest/hardware_enablement/psap-no
The *OpenShift Serveless Operator* is a prerequisite for the *Single Model Serving Platform*.

https://docs.openshift.com/container-platform/latest/hardware_enablement/psap-node-feature-discovery-operator.html[OpenShift Service Mesh Operator]::
The *OpenShift Service Mesh Operator* is a prerequisite for the *NSingle Model Serving Platform*.
The *OpenShift Service Mesh Operator* is a prerequisite for the *Single Model Serving Platform*.


[NOTE]
Expand Down
4 changes: 2 additions & 2 deletions modules/chapter2/pages/section2.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ tls.key: >-
type: kubernetes.io/tls
```

* Copy the Name portion of the text (optional, but helpful)
* Click create to apply this YAML into the istio-system proejct (namespace).
* Copy the Name in red portion of the text (optional, but helpful)
* Click *create* to apply this YAML into the istio-system proejct (namespace).

*We have copied the Secret used by OCP & made it available be used by OAI.*

Expand Down
92 changes: 67 additions & 25 deletions modules/chapter3/pages/section2.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,47 +9,41 @@ From the OpenShift AI ollama-model workbench dashboard,
click *Allow selected permissions* button to complete login to the notebook.

[NOTE]
If the *OPEN* link for the Notebook is grayed out, the notebook is still starting, this process can take a few minutes to 20+ minutes depending on the image we opt'd to choose.
If the *OPEN* link for the notebook is grayed out, the notebook container is still starting, this process can take a few minutes & up to 20+ minutes depending on the notebook image we opt'd to choose.


== Open the Jupyter Notebook

To continue learning about notebooks, open RHOAI.

In a new or an existing data science project, create a Standard Data Science workbench.

Open the notebook and clone the https://github.com/RedHatQuickCourses/rhods-qc-apps repository.
== Inside the Jupyter Notebook

Open the 1.intro/chapter3/intro/notebooks-intro.ipynb notebook and follow the instructions.
Clone the notebook file to interact with the Ollama Framework from this location: https://github.com/rh-aiservices-bu/llm-on-openshift.git

Now Clone the notebook file to interact with the Ollama Framework from this location: https://github.com/rh-aiservices-bu/llm-on-openshift.git

Navigate to the llm-on-openshift/exmaples/notebooks/lanchain folder:
Navigate to the llm-on-openshift/examples/notebooks/langchain folder:

Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_

== Customize the Notebook
Explore the notebook then continue.

=== Update the Inference Endpoint

Head back to the workbench dashboard & copy the interence endpoint from our ollama-mistral model.
Head back to the RHOAI workbench dashboard & copy the interence endpoint from our ollama-mistral model.

Return the Jupyter Notebook Environment,

* Paste the inference endpoint into the Cell label interfence_server_url = *"paste here"*
. Paste the inference endpoint into the Cell labeled interfence_server_url = *"replace with your own inference address"*

We can not start executing the code in the cells, starting with the Set the inference server url cell.
. We can now start executing the code in the cells, starting with the set the inference server url cell.

When we run the second cell: !pip install -q langchain==0.1.14 , there is an error notice, update pip or continue.
. Next we run the second cell: !pip install -q langchain==0.1.14 ; there is a notice to update pip, just ignore and continue.

The third cell imports the langchain components that provide the libraries and programming files to interact with our LLM model.
. The third cell imports the langchain components that provide the libraries and programming files to interact with our LLM model.

The fourth cell, place our first call to the Ollama-Mistral Framework Served by OpenShift AI.
. The fourth cell, place our first call to the Ollama-Mistral Framework Served by OpenShift AI.

[WARNING]
Before we continue we need to perform the following additional step. As mentioned, The Model Runtime we launched in OpenShift AI is a Framework that can host multiple LLM Models. It is currently running but is waiting for the command to instruct it to download Model to Serve. The following command needs to run from the OpenShift Dashboard. We are going to use the web_terminal operator to perform this next step.
Before we continue we need to perform the following additional step. As mentioned, The Ollama Model Runtime we launched in OpenShift AI is a Framework that can host multiple LLM Models. It is currently running but is waiting for the command to instruct it to download Model to Serve. The following command needs to run from the OpenShift Dashboard. We are going to use the web_terminal operator to perform this next step.

== Activating the Mistral Model in Ollama

. We will need to obtain the endpoint from the OpenShift AI model serving console. I usually just paste the text below into a cell in the Jupyter Notebook and paste the url in the code block from there.
We will need to obtain the endpoint from the OpenShift AI model serving console. I usually just paste the text below into a cell in the Jupyter Notebook and paste the url in the code block from there.

```yaml
curl https://your-endpoint/api/pull \
Expand All @@ -65,21 +59,69 @@ curl https://your-endpoint/api/pull \
. Click on the Start button in the terminal window, wait for the bash..$ prompt to appear
. Past the modified code block into the window and press enter.


The message: *status: pulling manifest* should appear. This begins the model downloading process.

Once the download completes, the *status: success:* message appears. We can now return to the Jupyter Notebook Tab in the browser and proceed.

=== Create the Prompt

This cell sets the *system message* portion of the query to our model. Normally we don't get the see this part of the query. This message details now the model should act / respond / and consider our questions. This adds checks to valdiate the information as best a possible, and to explain answers in detail.
This cell sets the *system message* portion of the query to our model. Normally we don't get the see this part of the query. This message details how the model should act / respond / and consider our questions. This adds checks to valdiate the information is best as possible, and to explain answers in detail.

Memory for the conversation
== Memory for the conversation

Keeps track of the conversation, this way follow up questions are also sent keeping the context for future questions.
Keeps track of the conversation, this way history of the chat are also sent along with new chat information keeping the context for future questions.

The next cell tracks the conversation and prints it to the Notebook output window so we can experience the full conversation list.

=== First Message to our LLM _(finally)_
=== First input to our LLM

The Notebooks first input to our model askes it to describe Paris in 100 words or less.

In green text is the window is the setup message that is sent along with the single sentence question to desctibe to the model how to consider and respond to the question.

It takes ~12 seconds for the model to respong with the first word of the reply, and the final word is printed to the screen ~30 seconds after the request was started.

The responce answered the question in a well considered and informated paragraph that less than 100 words in length

=== Second Input

Notice that the Second input - Is there a River, does not specify where the location is that might have a River. because the conversation history is passed with the second input, there is not need to specify any additional informaiton.

The total time to first word took ~14 seconds this time, just a bit longr due the orginal information being sent. The time for the entire reponse to be printed to the screen just took over 4 seoncds.

Overall our Model is performing well without a GPU and in a container limited to 4 cpus & 10Gb of memory.

== Second Example Prompt

Similar to the previous example, except we use the City of London, and run a cell to remove the verbose text reguarding what is sent or recieved apart from the another from model.

There is no change to memory setting, but go ahead and evalute where the second input; is there a river is answer correctly.

== Experimentation with Model

Add a few new cells to the Notebooks

Experiment with clearing the memory statement, then asking the river quetsion again. Or perhaps copy one of the input statements and add your own question for the model.

Try not clearing the memory and asking a few questions.

You have successfully deployed a Large Language Model, now test the information that it has available and find out what is doesn't know.


== Delete the Environment

Once you finished experimenting with questions, make you head back to the Red Hat Demo Platform and delete the Openshift Container Platform Cluster.

You don't have to remove any of the resources, deleting the environment will remove any resources created during this lesson.

=== Leave Feedback

If you enjoyed this walkthrough, please sent the team a note.
If you have suggestions to make it better or clarify a point, please send the team a note.

Until the next time, Keep being Awesome!




0 comments on commit f329ece

Please sign in to comment.