-
Notifications
You must be signed in to change notification settings - Fork 69
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* local models * Update langkit/examples/Local_Models.ipynb --------- Co-authored-by: felipe207 <[email protected]> Co-authored-by: richard-rogers <[email protected]> Co-authored-by: Jamie Broomall <[email protected]>
- Loading branch information
1 parent
21f726b
commit a818af7
Showing
6 changed files
with
285 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,209 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
">### 🚩 *Create a free WhyLabs account to complete this example!*<br> \n", | ||
">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylabs-free-sign-up?utm_source=github&utm_medium=referral&utm_campaign=Local_Models)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=github&utm_medium=referral&utm_campaign=Local_Models) to leverage the power of whylogs and WhyLabs together!*" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Using Langkit with Local Models\n", | ||
"\n", | ||
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/LanguageToolkit/blob/main/langkit/examples/Local_Models.ipynb)\n", | ||
"\n", | ||
"Some of the Langkit modules download models from the internet. This is not always possible, for example, when running in an environment without internet access. In this example, we will show how you can use Langkit with models stored locally." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Let's start by installing LangKit:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install langkit[all]==0.0.31 -q" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We're also assuming the existence of local models in specific folders, such as when downloading the models with the script below.\n", | ||
"\n", | ||
"Make sure you have git-lfs installed. If not, you can install it by running:\n", | ||
"\n", | ||
"`sudo apt-get install git-lfs`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Cloning into 'local-toxicity-model'...\n", | ||
"remote: Enumerating objects: 40, done.\u001b[K\n", | ||
"remote: Total 40 (delta 0), reused 0 (delta 0), pack-reused 40\u001b[K\n", | ||
"Unpacking objects: 100% (40/40), 301.27 KiB | 414.00 KiB/s, done.\n", | ||
"Cloning into 'local-sentence-transformers'...\n", | ||
"remote: Enumerating objects: 49, done.\u001b[K\n", | ||
"remote: Counting objects: 100% (3/3), done.\u001b[K\n", | ||
"remote: Compressing objects: 100% (3/3), done.\u001b[K\n", | ||
"remote: Total 49 (delta 0), reused 0 (delta 0), pack-reused 46\u001b[K\n", | ||
"Unpacking objects: 100% (49/49), 316.57 KiB | 311.00 KiB/s, done.\n", | ||
"Filtering content: 100% (3/3), 260.15 MiB | 16.47 MiB/s, done.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"!git clone https://huggingface.co/martin-ha/toxic-comment-model local-toxicity-model\n", | ||
"!git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 local-sentence-transformers" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The `martin-ha/toxic-comment-model` is the model currently used in `toxicity`, and `sentence-transformers/all-MiniLM-L6-v2` is used to generate embeddings in both `themes` and `input_output_modules`. We can pass the local paths when initializing the modules:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langkit import themes\n", | ||
"from langkit import toxicity\n", | ||
"from langkit import input_output\n", | ||
"\n", | ||
"from langkit import LangKitConfig\n", | ||
"\n", | ||
"local_config = LangKitConfig(toxicity_model_path=\"local-toxicity-model\",\n", | ||
" transformer_name=\"local-sentence-transformers\")\n", | ||
"\n", | ||
"toxicity.init(config=local_config)\n", | ||
"themes.init(config=local_config)\n", | ||
"input_output.init(config=local_config)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"If, for example, we want a local version for the `llm_metrics` module, we also need to import `textstat`, `regexes`, and `sentiment`. `regexes` and `textstat` are lightweight models and don't require external artifacts, so we can use them in a network restricted environment. `sentiment`, however, downloads artifacts from the internet, so let's replace it with `vader_sentiment`, which will yield the same results as `sentiment`, with the benefit of not requiring downloading artifacts at runtime." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langkit import regexes\n", | ||
"from langkit import vader_sentiment\n", | ||
"from langkit import textstat" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now, we should have an equivalent version of `llm_metrics` that doesn't require internet access. Let's check the results for a toy example:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"{'prompt': 'I like you. I love you',\n", | ||
" 'response': 'thanks!',\n", | ||
" 'prompt.jailbreak_similarity': 0.2522321939468384,\n", | ||
" 'response.refusal_similarity': 0.1535428911447525,\n", | ||
" 'prompt.toxicity': 0.006519913673400879,\n", | ||
" 'response.toxicity': 0.0011597275733947754,\n", | ||
" 'response.relevance_to_prompt': 0.23008441925048828,\n", | ||
" 'prompt.has_patterns': None,\n", | ||
" 'response.has_patterns': None,\n", | ||
" 'prompt.vader_sentiment': 0.7717,\n", | ||
" 'response.vader_sentiment': 0.4926,\n", | ||
" 'prompt.flesch_reading_ease': 119.19,\n", | ||
" 'response.flesch_reading_ease': 121.22,\n", | ||
" 'prompt.automated_readability_index': -6.7,\n", | ||
" 'response.automated_readability_index': 12.0,\n", | ||
" 'prompt.aggregate_reading_level': 1.0,\n", | ||
" 'response.aggregate_reading_level': 0.0,\n", | ||
" 'prompt.syllable_count': 6,\n", | ||
" 'response.syllable_count': 1,\n", | ||
" 'prompt.lexicon_count': 6,\n", | ||
" 'response.lexicon_count': 1,\n", | ||
" 'prompt.sentence_count': 2,\n", | ||
" 'response.sentence_count': 1,\n", | ||
" 'prompt.character_count': 17,\n", | ||
" 'response.character_count': 7,\n", | ||
" 'prompt.letter_count': 16,\n", | ||
" 'response.letter_count': 6,\n", | ||
" 'prompt.polysyllable_count': 0,\n", | ||
" 'response.polysyllable_count': 0,\n", | ||
" 'prompt.monosyllable_count': 6,\n", | ||
" 'response.monosyllable_count': 1,\n", | ||
" 'prompt.difficult_words': 0,\n", | ||
" 'response.difficult_words': 0}" | ||
] | ||
}, | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"from whylogs.experimental.core.udf_schema import udf_schema\n", | ||
"from langkit import extract\n", | ||
"\n", | ||
"text_schema = udf_schema()\n", | ||
"result = extract({\"prompt\":\"I like you. I love you\",\"response\":\"thanks!\"},schema=text_schema)\n", | ||
"\n", | ||
"result" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "langkit-rNdo63Yk-py3.8", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.