Skip to content

Commit

Permalink
Multiple competions (n) (#79)
Browse files Browse the repository at this point in the history
  • Loading branch information
svilupp authored Feb 22, 2024
1 parent 84f1054 commit 463a830
Show file tree
Hide file tree
Showing 21 changed files with 897 additions and 110 deletions.
9 changes: 8 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Added initial support for Google Gemini models for `aigenerate` (requires environment variable `GOOGLE_API_KEY` and package [GoogleGenAI.jl](https://github.com/tylerjthomas9/GoogleGenAI.jl) to be loaded).
- Added a utility to compare any two string sequences (and other iterators)`length_longest_common_subsequence`. It can be used to fuzzy match strings (eg, detecting context/sources in an AI-generated response or fuzzy matching AI response to some preset categories). See the docstring for more information `?length_longest_common_subsequence`.
- Rewrite of `aiclassify` to classify into an arbitrary list of categories (including with descriptions). It's a quick and easy option for "routing" and similar use cases, as it exploits the logit bias trick and outputs only 1 token. Currently only `OpenAISchema` is supported. See `?aiclassify` for more information.
- Rewrite of `aiclassify` to classify into an arbitrary list of categories (including with descriptions). It's a quick and easy option for "routing" and similar use cases, as it exploits the logit bias trick and outputs only 1 token. Currently, only `OpenAISchema` is supported. See `?aiclassify` for more information.
- Initial support for multiple completions in one request for OpenAI-compatible API servers. Set via API kwarg `n=5` and it will request 5 completions in one request, saving the network communication time and paying the prompt tokens only once. It's useful for majority voting, diversity, or challenging agentic workflows.
- Added new fields to `AIMessage` and `DataMessage` types to simplify tracking in complex applications. Added fields:
- `cost` - the cost of the query (summary per call, so count only once if you requested multiple completions in one call)
- `log_prob` - summary log probability of the generated sequence, set API kwarg `logprobs=true` to receive it
- `run_id` - ID of the AI API call
- `sample_id` - ID of the sample in the batch if you requested multiple completions, otherwise `sample_id==nothing` (they will have the same `run_id`)
- `finish_reason` - the reason why the AI stopped generating the sequence (eg, "stop", "length") to provide more visibility for the user

### Fixed

Expand Down
184 changes: 183 additions & 1 deletion docs/src/frequently_asked_questions.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,21 @@ Resources:

Pro tip: Always set the spending limits!

## Getting an error "ArgumentError: api_key cannot be empty" despite having set `OPENAI_API_KEY`?

Quick fix: just provide kwarg `api_key` with your key to the `aigenerate` function (and other `ai*` functions).

This error is thrown when the OpenAI API key is not available in 1) local preferences or 2) environment variables (`ENV["OPENAI_API_KEY"]`).

First, check if you can access the key by running `ENV["OPENAI_API_KEY"]` in the Julia REPL. If it returns `nothing`, the key is not set.

If the key is set, but you still get the error, there was a rare bug in earlier versions where if you first precompiled PromptingTools without the API key, it would remember it and "compile away" the `get(ENV,...)` function call. If you're experiencing this bug on the latest version of PromptingTools, please open an issue on GitHub.

The solution is to force a new precompilation, so you can do any of the below:
1) Force precompilation (run `Pkg.precompile()` in the Julia REPL)
2) Update the PromptingTools package (runs precompilation automatically)
3) Delete your compiled cache in `.julia` DEPOT (usually `.julia/compiled/v1.10/PromptingTools`). You can do it manually in the file explorer or via Julia REPL: `rm("~/.julia/compiled/v1.10/PromptingTools", recursive=true, force=true)`

## Setting OpenAI Spending Limits

OpenAI allows you to set spending limits directly on your account dashboard to prevent unexpected costs.
Expand Down Expand Up @@ -149,4 +164,171 @@ There are three ways how you can customize your workflows (especially when you u

1) Import the functions/types you need explicitly at the top (eg, `using PromptingTools: OllamaSchema`)
2) Register your model and its associated schema (`PT.register_model!(; name="123", schema=PT.OllamaSchema())`). You won't have to specify the schema anymore only the model name. See [Working with Ollama](#working-with-ollama) for more information.
3) Override your default model (`PT.MODEL_CHAT`) and schema (`PT.PROMPT_SCHEMA`). It can be done persistently with Preferences, eg, `PT.set_preferences!("PROMPT_SCHEMA" => "OllamaSchema", "MODEL_CHAT"=>"llama2")`.
3) Override your default model (`PT.MODEL_CHAT`) and schema (`PT.PROMPT_SCHEMA`). It can be done persistently with Preferences, eg, `PT.set_preferences!("PROMPT_SCHEMA" => "OllamaSchema", "MODEL_CHAT"=>"llama2")`.

## How to have a Multi-turn Conversations?

Let's say you would like to respond back to a model's response. How to do it?

1) With `ai""` macro
The simplest way if you used `ai""` macro, is to send a reply with the `ai!""` macro. It will use the last response as the conversation.
```julia
ai"Hi! I'm John"

ai!"What's my name?"
# Return: "Your name is John."
```

2) With `aigenerate` function
You can use the `conversation` keyword argument to pass the previous conversation (in all `ai*` functions). It will prepend the past `conversation` before sending the new request to the model.

To get the conversation, set `return_all=true` and store the whole conversation thread (not just the last message) in a variable. Then, use it as a keyword argument in the next call.

```julia
conversation = aigenerate("Hi! I'm John"; return_all=true)
@info last(conversation) # display the response

# follow-up (notice that we provide past messages as conversation kwarg
conversation = aigenerate("What's my name?"; return_all=true, conversation)

## [ Info: Tokens: 50 @ Cost: $0.0 in 1.0 seconds
## 5-element Vector{PromptingTools.AbstractMessage}:
## PromptingTools.SystemMessage("Act as a helpful AI assistant")
## PromptingTools.UserMessage("Hi! I'm John")
## AIMessage("Hello John! How can I assist you today?")
## PromptingTools.UserMessage("What's my name?")
## AIMessage("Your name is John.")
```
Notice that the last message is the response to the second request, but with `return_all=true` we can see the whole conversation from the beginning.

## Explain What Happens Under the Hood

4 Key Concepts/Objects:
- Schemas -> object of type `AbstractPromptSchema` that determines which methods are called and, hence, what providers/APIs are used
- Prompts -> the information you want to convey to the AI model
- Messages -> the basic unit of communication between the user and the AI model (eg, `UserMessage` vs `AIMessage`)
- Prompt Templates -> re-usable "prompts" with placeholders that you can replace with your inputs at the time of making the request

When you call `aigenerate`, roughly the following happens: `render` -> `UserMessage`(s) -> `render` -> `OpenAI.create_chat` -> ... -> `AIMessage`.

We'll deep dive into an example in the end.

### Schemas

For your "message" to reach an AI model, it needs to be formatted and sent to the right place.

We leverage the multiple dispatch around the "schemas" to pick the right logic.
All schemas are subtypes of `AbstractPromptSchema` and there are many subtypes, eg, `OpenAISchema <: AbstractOpenAISchema <:AbstractPromptSchema`.

For example, if you provide `schema = OpenAISchema()`, the system knows that:
- it will have to format any user inputs to OpenAI's "message specification" (a vector of dictionaries, see their API documentation). Function `render(OpenAISchema(),...)` will take care of the rendering.
- it will have to send the message to OpenAI's API. We will use the amazing `OpenAI.jl` package to handle the communication.

### Prompts

Prompt is loosely the information you want to convey to the AI model. It can be a question, a statement, or a command. It can have instructions or some context, eg, previous conversation.

You need to remember that Large Language Models (LLMs) are **stateless**. They don't remember the previous conversation/request, so you need to provide the whole history/context every time (similar to how REST APIs work).

Prompts that we send to the LLMs are effectively a sequence of messages (`<:AbstractMessage`).

### Messages

Messages are the basic unit of communication between the user and the AI model.

There are 5 main types of messages (`<:AbstractMessage`):

- `SystemMessage` - this contains information about the "system", eg, how it should behave, format its output, etc. (eg, `You're a world-class Julia programmer. You write brief and concise code.)
- `UserMessage` - the information "from the user", ie, your question/statement/task
- `UserMessageWithImages` - the same as `UserMessage`, but with images (URLs or Base64-encoded images)
- `AIMessage` - the response from the AI model, when the "output" is text
- `DataMessage` - the response from the AI model, when the "output" is data, eg, embeddings with `aiembed` or user-defined structs with `aiextract`

### Prompt Templates

We want to have re-usable "prompts", so we provide you with a system to retrieve pre-defined prompts with placeholders (eg, `{{name}}`) that you can replace with your inputs at the time of making the request.

"AI Templates" as we call them (`AITemplate`) are usually a vector of `SystemMessage` and a `UserMessage` with specific purpose/task.

For example, the template `:AssistantAsk` is defined loosely as:

```julia
template = [SystemMessage("You are a world-class AI assistant. Your communication is brief and concise. You're precise and answer only when you're confident in the high quality of your answer."),
UserMessage("# Question\n\n{{ask}}")]
```

Notice that we have a placeholder `ask` (`{{ask}}`) that you can replace with your question without having to re-write the generic system instructions.

When you provide a Symbol (eg, `:AssistantAsk`) to ai* functions, thanks to the multiple dispatch, it recognizes that it's an `AITemplate(:AssistantAsk)` and looks it up.

You can discover all available templates with `aitemplates("some keyword")` or just see the details of some template `aitemplates(:AssistantAsk)`.

### Walkthrough Example

```julia
using PromptingTools
const PT = PromptingTools

# Let's say this is our ask
msg = aigenerate(:AssistantAsk; ask="What is the capital of France?")

# it is effectively the same as:
msg = aigenerate(PT.OpenAISchema(), PT.AITemplate(:AssistantAsk); ask="What is the capital of France?", model="gpt3t")
```

There is no `model` provided, so we use the default `PT.MODEL_CHAT` (effectively GPT3.5-Turbo). Then we look it up in `PT.MDOEL_REGISTRY` and use the associated schema for it (`OpenAISchema` in this case).

The next step is to render the template, replace the placeholders and render it for the OpenAI model.

```julia
# Let's remember out schema
schema = PT.OpenAISchema()
ask = "What is the capital of France?"
```

First, we obtain the template (no placeholder replacement yet) and "expand it"
```julia
template_rendered = PT.render(schema, AITemplate(:AssistantAsk); ask)
```

```plaintext
2-element Vector{PromptingTools.AbstractChatMessage}:
PromptingTools.SystemMessage("You are a world-class AI assistant. Your communication is brief and concise. You're precise and answer only when you're confident in the high quality of your answer.")
PromptingTools.UserMessage{String}("# Question\n\n{{ask}}", [:ask], :usermessage)
```

Second, we replace the placeholders
```julia
rendered_for_api = PT.render(schema, template_rendered; ask)
```

```plaintext
2-element Vector{Dict{String, Any}}:
Dict("role" => "system", "content" => "You are a world-class AI assistant. Your communication is brief and concise. You're precise and answer only when you're confident in the high quality of your answer.")
Dict("role" => "user", "content" => "# Question\n\nWhat is the capital of France?")
```

Notice that the placeholders are only replaced in the second step. The final output here is a vector of messages with "role" and "content" keys, which is the format required by the OpenAI API.

As a side note, under the hood, the second step is done in two steps:

- replace the placeholders `messages_rendered = PT.render(PT.NoSchema(), template_rendered; ask)` -> returns a vector of Messages!
- then, we convert the messages to the format required by the provider/schema `PT.render(schema, messages_rendered)` -> returns the OpenAI formatted messages


Next, we send the above `rendered_for_api` to the OpenAI API and get the response back.

```julia
using OpenAI
OpenAI.create_chat(api_key, model, rendered_for_api)
```

The last step is to take the JSON response from the API and convert it to the `AIMessage` object.

```julia
# simplification for educational purposes
msg = AIMessage(; content = r.response[:choices][1][:message][:content])
```
In practice, there are more fields we extract, so we define a utility for it: `PT.response_to_message`. Especially, since with parameter `n`, you can request multiple AI responses at once, so we want to re-use our response processing logic.

That's it! I hope you've learned something new about how PromptingTools.jl works under the hood.
2 changes: 2 additions & 0 deletions src/Experimental/AgentTools/lazy_types.jl
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ This can be used to "reply" to previous message / continue the stored conversati
success::Union{Nothing, Bool} = nothing
error::Union{Nothing, Exception} = nothing
end
## main sample
## samples

function AICall(func::F, args...; kwargs...) where {F <: Function}
@assert length(args)<=2 "AICall takes at most 2 positional arguments (provided: $(length(args)))"
Expand Down
13 changes: 13 additions & 0 deletions src/Experimental/RAGTools/types.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,19 @@ abstract type AbstractChunkIndex <: AbstractDocumentIndex end
# More advanced index would be: HybridChunkIndex

# Stores document chunks and their embeddings
"""
ChunkIndex
Main struct for storing document chunks and their embeddings. It also stores tags and sources for each chunk.
# Fields
- `id::Symbol`: unique identifier of each index (to ensure we're using the right index with `CandidateChunks`)
- `chunks::Vector{<:AbstractString}`: underlying document chunks / snippets
- `embeddings::Union{Nothing, Matrix{<:Real}}`: for semantic search
- `tags::Union{Nothing, AbstractMatrix{<:Bool}}`: for exact search, filtering, etc. This is often a sparse matrix indicating which chunks have the given `tag` (see `tag_vocab` for the position lookup)
- `tags_vocab::Union{Nothing, Vector{<:AbstractString}}`: vocabulary for the `tags` matrix (each column in `tags` is one item in `tags_vocab` and rows are the chunks)
- `sources::Vector{<:AbstractString}`: sources of the chunks
"""
@kwdef struct ChunkIndex{
T1 <: AbstractString,
T2 <: Union{Nothing, Matrix{<:Real}},
Expand Down
13 changes: 13 additions & 0 deletions src/llm_interface.jl
Original file line number Diff line number Diff line change
Expand Up @@ -250,3 +250,16 @@ function aiscan(prompt; model = MODEL_CHAT, kwargs...)
schema = get(MODEL_REGISTRY, model, (; schema = PROMPT_SCHEMA)).schema
aiscan(schema, prompt; model, kwargs...)
end

"Utility to facilitate unwrapping of HTTP response to a message type `MSG` provided. Designed to handle multi-sample completions."
function response_to_message(schema::AbstractPromptSchema,
MSG::Type{T},
choice,
resp;
return_type = nothing,
model_id::AbstractString = "",
time::Float64 = 0.0,
run_id::Integer = rand(Int16),
sample_id::Union{Nothing, Integer} = nothing) where {T}
throw(ArgumentError("Response unwrapping not implemented for $(typeof(schema)) and $MSG"))
end
18 changes: 13 additions & 5 deletions src/llm_ollama.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
# - llm_olama.jl works by providing messages format to /api/chat
# - llm_managed_olama.jl works by providing 1 system prompt and 1 user prompt /api/generate
#
# TODO: switch to OpenAI-compatible endpoint!
#
## Schema dedicated to [Ollama's models](https://ollama.ai/), which also managed the prompt templates
#
## Rendering of converation history for the Ollama API (similar to OpenAI but not for the images)
Expand Down Expand Up @@ -157,10 +159,14 @@ function aigenerate(prompt_schema::AbstractOllamaSchema, prompt::ALLOWED_PROMPT_
http_kwargs,
api_kwargs...)

tokens_prompt = get(resp.response, :prompt_eval_count, 0)
tokens_completion = get(resp.response, :eval_count, 0)
msg = AIMessage(; content = resp.response[:message][:content] |> strip,
status = Int(resp.status),
tokens = (get(resp.response, :prompt_eval_count, 0),
get(resp.response, :eval_count, 0)),
cost = call_cost(tokens_prompt, tokens_completion, model_id),
## not coming through yet anyway
## finish_reason = get(resp.response, :finish_reason, nothing),
tokens = (tokens_prompt, tokens_completion),
elapsed = time)
## Reporting
verbose && @info _report_stats(msg, model_id)
Expand All @@ -184,7 +190,7 @@ function aiembed(prompt_schema::AbstractOllamaSchema, args...; kwargs...)
end

"""
aiscan([prompt_schema::AbstractOllamaSchema,] prompt::ALLOWED_PROMPT_TYPE;
aiscan([prompt_schema::AbstractOllamaSchema,] prompt::ALLOWED_PROMPT_TYPE;
image_url::Union{Nothing, AbstractString, Vector{<:AbstractString}} = nothing,
image_path::Union{Nothing, AbstractString, Vector{<:AbstractString}} = nothing,
attach_to_latest::Bool = true,
Expand Down Expand Up @@ -314,10 +320,12 @@ function aiscan(prompt_schema::AbstractOllamaSchema, prompt::ALLOWED_PROMPT_TYPE
system = nothing, messages = conv_rendered, endpoint = "chat", model = model_id,
http_kwargs,
api_kwargs...)
tokens_prompt = get(resp.response, :prompt_eval_count, 0)
tokens_completion = get(resp.response, :eval_count, 0)
msg = AIMessage(; content = resp.response[:message][:content] |> strip,
status = Int(resp.status),
tokens = (get(resp.response, :prompt_eval_count, 0),
get(resp.response, :eval_count, 0)),
cost = call_cost(tokens_prompt, tokens_completion, model_id),
tokens = (tokens_prompt, tokens_completion),
elapsed = time)
## Reporting
verbose && @info _report_stats(msg, model_id)
Expand Down
8 changes: 6 additions & 2 deletions src/llm_ollama_managed.jl
Original file line number Diff line number Diff line change
Expand Up @@ -214,10 +214,12 @@ function aigenerate(prompt_schema::AbstractOllamaManagedSchema, prompt::ALLOWED_
time = @elapsed resp = ollama_api(prompt_schema, conv_rendered.prompt;
conv_rendered.system, endpoint = "generate", model = model_id, http_kwargs,
api_kwargs...)
tokens_prompt = get(resp.response, :prompt_eval_count, 0)
tokens_completion = get(resp.response, :eval_count, 0)
msg = AIMessage(; content = resp.response[:response] |> strip,
status = Int(resp.status),
tokens = (get(resp.response, :prompt_eval_count, 0),
get(resp.response, :eval_count, 0)),
cost = call_cost(tokens_prompt, tokens_completion, model_id),
tokens = (tokens_prompt, tokens_completion),
elapsed = time)
## Reporting
verbose && @info _report_stats(msg, model_id)
Expand Down Expand Up @@ -326,6 +328,7 @@ function aiembed(prompt_schema::AbstractOllamaManagedSchema,
msg = DataMessage(;
content = postprocess(resp.response[:embedding]),
status = Int(resp.status),
cost = call_cost(0, 0, model_id),
tokens = (0, 0), # token counts are not provided for embeddings
elapsed = time)
## Reporting
Expand Down Expand Up @@ -356,6 +359,7 @@ function aiembed(prompt_schema::AbstractOllamaManagedSchema,
msg = DataMessage(;
content = mapreduce(x -> x.content, hcat, messages),
status = mapreduce(x -> x.status, max, messages),
cost = mapreduce(x -> x.cost, +, messages),
tokens = (0, 0),# not tracked for embeddings in Ollama
elapsed = sum(x -> x.elapsed, messages))
## Reporting
Expand Down
Loading

0 comments on commit 463a830

Please sign in to comment.