Add preference system (#17)

Fixes #13
svilupp · Apr 19, 2024 · 6cbbfd6 · 6cbbfd6 · svilupp · Apr 19, 2024
1 parent c3d2c44
commit 6cbbfd6
Show file tree

Hide file tree

Showing 18 changed files with 898 additions and 63 deletions.
diff --git a/Artifacts.toml b/Artifacts.toml
@@ -84,7 +84,7 @@ lazy = true
 
     [["tidier__nomicembedtext-0-Bool".download]]
     sha256 = "b3f604f382a7191c657c77017e9d3e455e6eacc1fa6a59e6202e8a83498d4269"
-    url = "https://github.com/svilupp/AIHelpMeArtifacts/raw/main/artifacts/tidier__v20240407__nomicembedtext-1024-Bool__v1.0.tar.gz"
+    url = "https://github.com/svilupp/AIHelpMeArtifacts/raw/main/artifacts/tidier__v20240407__nomicembedtext-0-Bool__v1.0.tar.gz"
 
 ["makie__nomicembedtext-0-Float32"]
 git-tree-sha1 = "4b0ff243278fcbda1491db94e4082c1050a44d80"
@@ -100,4 +100,4 @@ lazy = true
 
     [["makie__nomicembedtext-0-Bool".download]]
     sha256 = "c62a0d3d27fc417498d2449b59c58d18dba34164094a08206b51e88d05fa7fa7"
-    url = "https://github.com/svilupp/AIHelpMeArtifacts/raw/main/artifacts/makie__v20240330__nomicembedtext-1024-Bool__v1.0.tar.gz"
+    url = "https://github.com/svilupp/AIHelpMeArtifacts/raw/main/artifacts/makie__v20240330__nomicembedtext-0-Bool__v1.0.tar.gz"
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,11 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
-- (Preliminary) Knowledge packs available for Julia docs (`:julia`), Tidier ecosystem (`:tidier`), Makie ecosystem (`:makie`). Load with `load_index!(:julia)` or several with `load_index!([:julia, :tidier])`.
+
+### Fixed
+
+## [0.1.0]
+
+### Added
+- (Preliminary) Knowledge packs available for Julia docs (`:julia`), Tidier ecosystem (`:tidier`), and Makie ecosystem (`:makie`). Load with `load_index!(:julia)` or several with `load_index!([:julia, :tidier])`.
+- Preferences.jl-based persistence for chat model, embedding model, embedding dimension, and which knowledge packs to load on start. See `AIHelpMe.PREFERENCES` for more details.
+- Precompilation statements to improve TTFX.
+- First Q&A evaluation dataset in folder `evaluations/`.
 
 ### Changed
-- Bumped up PromptingTools to v0.20 (brings new RAG capabilities, pretty-printing, etc.)
-- Changed default model to be GPT-4 Turbo to improve answer quality
+- Bumped up PromptingTools to v0.21 (brings new RAG capabilities, pretty-printing, etc.)
+- Changed default model to be GPT-4 Turbo to improve the answer quality (you can quickly change to "gpt3t" if you want something simple)
 - Documentation moved to Vitepress
 
 ### Fixed

diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "AIHelpMe"
 uuid = "01402e1f-dc83-4213-a98b-42887d758baa"
 authors = ["J S <[email protected]> and contributors"]
-version = "0.0.1-DEV"
+version = "0.1.0"
 
 [deps]
 HDF5 = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
@@ -25,7 +25,7 @@ LazyArtifacts = "<0.0.1, 1"
 LinearAlgebra = "<0.0.1, 1"
 PrecompileTools = "1"
 Preferences = "1"
-PromptingTools = "=0.20.1"
+PromptingTools = "0.21"
 REPL = "1"
 SHA = "0.7"
 Serialization = "<0.0.1, 1"

diff --git a/README.md b/README.md
@@ -121,6 +121,9 @@ All setup should take less than 5 minutes!
 > [!TIP]
 > Your results will significantly improve if you enable re-ranking of the context to be provided to the model (eg, `aihelp(..., rerank=true)`) or change pipeline to `update_pipeline!(:silver)`. It requires setting up Cohere API key but it's free for community use.
 
+> [!TIP]
+> Do you want to safely execute the generated code? Use `AICode` from `PromptingTools.Experimental.AgentToolsAI.`. It can executed the code in a scratch module and catch errors if they happen (eg, apply directly to `AIMessage` response like `AICode(msg)`).
+
 Noticed some weird answers? Please let us know! See the section "Help Us Improve and Debug" in the Advanced section of the docs!
 
 ## How to Obtain API Keys
@@ -158,7 +161,7 @@ f(Int8(2))
 # we get: ERROR: MethodError: no method matching f(::Int8)
 
 # Help is here:
-aihelp"What does this error mean? $err" # Note the $err to interpolate the stacktrace
+aihelp"What does this error mean? \$err" # Note the $err to interpolate the stacktrace
 ```
 
 ```plaintext
@@ -221,15 +224,22 @@ A: Not at the moment. It might be possible in the future, as PromptingTools.jl s
 **Q: Why do we need Cohere API Key?**
 A: Cohere's API is used to re-rank the best matching snippets from the documentation. It's free to use in limited quantities (ie, ~thousand requests per month), which should be enough for most users. Re-ranking improves the quality and accuracy of the answers.
 
+**Q: Why do we need Tavily API Key?**
+A: Tavily's API is used to search the best matching snippets from the documentation. It's free to use in limited quantities (ie, ~thousand requests per month), which should be enough for most users. Searching improves the quality and accuracy of the answers.
+
+**Q: Can we use Ollama (locally-hosted) models?**
+A: Yes, see the Advanced section in the docs.
+
 ## Future Directions
 
 AIHelpMe is continuously evolving. Future updates may include:
-- Tools to trace the provenance of answers (ie, where did the answer come from?).
+- GUI interface (I'm looking at you, Stipple.jl!)
+- Better tools to trace the provenance of answers (ie, where did the answer come from?).
 - Creation of a gold standard Q&A dataset for evaluation.
 - Refinement of the RAG ingestion pipeline for optimized chunk processing and deduplication.
 - Introduction of context filtering to focus on specific modules.
 - Transition to a more sophisticated multi-turn conversation design.
 - Enhancement of the marginal information provided by the RAG context.
 - Expansion of content sources beyond docstrings, potentially including documentation sites and community resources like Discourse or Slack posts.
 
-Please note that this is merely a pre-release to gauge the interest in this project.
+Please note that this is merely a pre-release - we still have a long way to go...
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -4,4 +4,4 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 DocumenterVitepress = "4710194d-e776-4893-9690-8d956a29c365"
 
 [compat]
-DocumenterVitepress = "0.0.7"
+DocumenterVitepress = "0.0.18"
diff --git a/docs/src/advanced.md b/docs/src/advanced.md
@@ -1,26 +1,26 @@
 # Advanced
 
 ## Using Ollama Models
-AIHelpMe can use Ollama models (locally-hosted models), but the knowledge packs are available for only one embedding model: "nomic-embed-text"!
 
-You must set `model_embedding="nomic-embed-text"` and `truncate_dimension=0` (maximum dimension available) for everything to work correctly!
+AIHelpMe can use Ollama (locally-hosted) models, but the knowledge packs are available for only one embedding model: "nomic-embed-text"!
+
+You must set `model_embedding="nomic-embed-text"` and `embedding_dimension=0` (maximum dimension available) for everything to work correctly!
 
 Example:
 
 ```julia
 using PromptingTools: register_model!, OllamaSchema
 using AIHelpMe: update_pipeline!, load_index!
 
-# register model names with the Ollama schema
-register_model!(; name="mistral:7b-instruct-v0.2-q4_K_M",schema=OllamaSchema())
-register_model!(; name="nomic-embed-text",schema=OllamaSchema())
+# register model names with the Ollama schema - if needed!
+# eg, register_model!(; name="mistral:7b-instruct-v0.2-q4_K_M",schema=OllamaSchema())
 
-# you can use whichever chat model you like!
-update_pipeline!(:bronze; model_chat = "mistral:7b-instruct-v0.2-q4_K_M",model_embedding="nomic-embed-text", truncate_dimension=0)
+# you can use whichever chat model you like! Llama 3 8b is the best trade-of right now and it's already known to PromptingTools.
+update_pipeline!(:bronze; model_chat = "llama3",model_embedding="nomic-embed-text", embedding_dimension=0)
 
 
 # You must download the corresponding knowledge packs via `load_index!` (because you changed the embedding model)
-load_index!(:julia) # or whichever other packs you want!
+load_index!()
 ```
 
 Let's ask a question:
@@ -35,6 +35,27 @@ PromptingTools.AIMessage("In Julia, you can create a named tuple by enclosing ke
 ...continues
 ```
 
+You can use the Preference setting mechanism (`?PREFERENCES`) to change the default settings.
+```julia
+AIHelpMe.set_preferences!("MODEL_CHAT" => "llama3", "MODEL_EMBEDDING" => "nomic-embed-text", "EMBEDDING_DIMENSION" => 0)
+```
+
+## Code Execution
+
+Use `AICode` on the generated answers.
+
+For example:
+```julia
+using PromptingTools.Experimental.AgentTools: AICode
+
+msg = aihelp"Write a code to sum up 1+1"
+cb = AICode(msg)
+# Output: AICode(Success: True, Parsed: True, Evaluated: True, Error Caught: N/A, StdOut: True, Code: 1 Lines)
+```
+
+If you want to access the extracted code, simply use `cb.code`. If there is an error, you can see it in `cb.error`.
+
+See the docstrings to learn more about it!
 
 ## Extending the Knowledge Base
 
@@ -56,10 +77,84 @@ To use your newly created index as the main source for queries, execute `load_in
 
 The main index for queries is held in the global variable `AIHelpMe.MAIN_INDEX[]`.
 
+## Save Your Preferences
+
+You can now leverage the amazing Preferences.jl mechanism to save your default choices of the chat model, embedding model, embedding dimension, and which knowledge packs to load on start.
+
+For example, if you wanted to switch to Ollama-based pipeline, you could persist the configuration with:
+
+```julia
+AIHelpMe.set_preferences!("MODEL_CHAT" => "llama3", "MODEL_EMBEDDING" => "nomic-embed-text", "EMBEDDING_DIMENSION" => 0)
+```
+This will create a file `LocalPreferences.toml` in your project directory that will remember these choices across sessions.
+
+See `AIHelpMe.PREFERENCES` for more details.
+
+## Debugging Poor Answers
+
+We're using a Retrieval-Augmented Generation (RAG) pipeline, which consists of two major steps:
+
+1. Generate a "context" snippet from the question and the current context.
+2. Generate an "answer" from the context.
+3. (Optional) Generate a refined answer (only some pipelines will have this).
+
+If you're not getting good answers / the answers you would expect, you need to understand if the problem is in step 1 or step 2.
+
+First, get the `RAGResult` (the full detail of the pipeline and its intermediate steps):
+```julia
+using AIHelpMe: pprint, last_result
+
+# This will help you access how was the last answer generated
+result = last_result()
+
+# You can pretty-print the result
+pprint(result)
+```
+
+**Checking the Context**
+Check the `result.context` - is it using the right snippets ("knowledge") from the knowledge packs?
+Alternatively, you can add it directly to the pretty-printed answer:
+```julia
+pprint(result; add_context=true, add_scores=false)
+```
+
+How is the context produced? 
+
+It's a function of your pipeline setting (see below) and the knowledge packs loaded (`AIHelpMe.LOADED_PACKS`).
+
+A quick way to diagnose the context pipeline:
+
+- See a quick overview of the embedding configuration with `AIHelpMe.get_config_key()` (it captures the embedding model, dimension, and element type).
+- See the pipeline configuration (which steps) with `AIHelpMe.RAG_CONFIG[]`.
+- See the individual kwargs provided to the pipeline with `AIHelpMe.RAG_KWARGS[]`.
+
+The simplest fix for "poor" context is to enable reranking (`rerank=true`).
+
+Other angles to consider:
+- Are you sure if the source information is present in the knowledge pack? Can it be added (see `?update_index`)?
+- Was the question phrased poorly? Can we ask the same in a more sensible way?
+- Try different models, dimensions, element types (Bool is less precise than Float32, etc.)
+
+**Checking the First Answer**
+
+Check the original answer (`result.answer`) - is it correct?
+
+Assuming the context was good, but the answer is not we need to find out if it's because of the model or because of the prompt template.
+- Is it that you're using a weak chat model? Try different models. Does it improve?
+- Is the prompt template not suitable for your questions? Experiment with tweaking the prompt template.
+
+**Checking the Refined Answer**
+
+You always see the final answer by default (`result.final_answer`). 
+Is it different from `result.answer`? That means the `refiner` step in the RAG pipeline has been used. 
+
+Debug it in the same way as the first answer. 
+- Is it the context provided to the refiner? 
+- Is it the answer from the refiner? If yes, is it because of the model? Or because of the prompt template?
 
 ## Help Us Improve and Debug
 
-It would be incredibly helpful if you could share examples when the pipeline fails.
+It would be incredibly helpful if you could share examples when the RAG pipeline fails.
 We're particularly interested in cases where the answer you get is wrong because of the "bad" context provided.
 
 Let's say you ran a question and got a wrong answer (or some other aspect is worth reporting).

diff --git a/docs/src/faq.md b/docs/src/faq.md
@@ -10,7 +10,6 @@ Each query incurs only a fraction of a cent, depending on the length and chosen
 
 ## Can I use the Cohere Trial API Key for commercial projects?
 No, a trial key is only for testing purposes. But it takes only a few clicks to switch to Production API. The cost is only $1 per 1000 searches (!!!) and has many other benefits.
-Alternatively, set a different `rerank_strategy` in `aihelp` calls to avoid using Cohere API.
 
 ## How accurate are the answers?
 Like any other Generative AI answers, ie, it depends and you should always double-check.
@@ -24,3 +23,7 @@ Cohere's API is used to re-rank the best matching snippets from the documentatio
 ## Why do we need Tavily API Key?
 Tavily is used for the web search results to augment your answers. It's free to use in limited quantities.
 
+## Can we use Ollama (locally-hosted) models?
+Yes! See the [Using Ollama Models](@ref) section.
+
+
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -45,7 +45,7 @@ AI models, while powerful, often produce inaccurate or outdated information, kno
 
 AIHelpMe addresses these challenges by incorporating "knowledge packs" filled with preprocessed, up-to-date Julia information. This ensures that you receive not only faster but also more reliable and contextually accurate coding assistance. 
 
-With AIHelpMe, you benefit from enhanced AI reliability tailored specifically to Julia’s unique environment, leading to better, more informed coding decisions.
+Most importantly, AIHelpMe is designed to be uniquely yours! You can customize the RAG pipeline however you want, bring any additional knowledge (eg, your currently loaded packages) and use it to get more accurate answers on something that's not even public!
 
 ## Getting Started
 

diff --git a/docs/src/introduction.md b/docs/src/introduction.md
@@ -4,10 +4,11 @@ Welcome to AIHelpMe.jl, your go-to for getting answers to your Julia coding ques
 
 AIhelpMe is a simple wrapper around RAG functionality in PromptingTools.
 
-It provides two extras:
+It provides three extras:
 
 - (hopefully), a simpler interface to handle RAG configurations (there are thousands of possible configurations)
 - pre-computed embeddings for key “knowledge” in the Julia ecosystem (we refer to them as “knowledge packs”)
+- ability to quickly incorporate any additional knowledge (eg, your currently loaded packages) into the "assistant"
 
 > [!CAUTION]
 > This is only a prototype! We have not tuned it yet, so your mileage may vary! Always check your results from LLMs!
@@ -137,10 +138,15 @@ All setup should take less than 5 minutes!
 > [!TIP]
 > Your results will significantly improve if you enable re-ranking of the context to be provided to the model (eg, `aihelp(..., rerank=true)`) or change pipeline to `update_pipeline!(:silver)`. It requires setting up Cohere API key but it's free for community use.
 
+> [!TIP]
+> Do you want to safely execute the generated code? Use `AICode` from `PromptingTools.Experimental.AgentToolsAI.`. It can executed the code in a scratch module and catch errors if they happen (eg, apply directly to `AIMessage` response like `AICode(msg)`).
+
 Noticed some weird answers? Please let us know! See [Help Us Improve and Debug](@ref).
 
 If you want to use locally-hosted models, see the [Using Ollama Models](@ref) section.
 
+If you want to customize your setup, see `AIHelpMe.PREFERENCES`.
+
 ## How to Obtain API Keys
 
 ### OpenAI API Key: