Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anthropic prompt caching #924

Merged
merged 16 commits into from
Dec 9, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
282 changes: 256 additions & 26 deletions THIRD_PARTY_LICENSES.md

Large diffs are not rendered by default.

10 changes: 8 additions & 2 deletions docs/src/content/docs/reference/scripts/context.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,13 +200,19 @@

### Prompt Caching

You can specify `ephemeral: true` to enable prompt caching optimization. In particular, a `def` with `ephemeral` will be rendered at the back of the prompt
to persist the [cache prefix](https://openai.com/index/api-prompt-caching/).
You can use `cacheControl: "ephemeral"` to specify that the prompt can be cached
for a short amount of time, and enable prompt caching optimization, which is supported (differently) by various LLM providers.

```js "ephemeral: true"
$`...`.cacheControl("ephemeral")
```

```js "ephemeral: true"
def("FILE", env.files, { ephemeral: true })
```

Read more about [prompt caching](/genaiscript/reference/scripts/prompt-caching).

Check warning on line 215 in docs/src/content/docs/reference/scripts/context.md

View workflow job for this annotation

GitHub Actions / build

The documentation for prompt caching is duplicated between `context.md` and a new file `prompt-caching.mdx`. This redundancy can be avoided by linking to the existing content or consolidating it in one place.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for prompt caching is duplicated between context.md and a new file prompt-caching.mdx. This redundancy can be avoided by linking to the existing content or consolidating it in one place.

generated by pr-docs-review-commit content-structure

### Safety: Prompt Injection detection

You can schedule a check for prompt injection/jai break with your configured [content safety](/genaiscript/reference/scripts/content-safety) provider.
Expand Down
37 changes: 37 additions & 0 deletions docs/src/content/docs/reference/scripts/prompt-caching.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: Prompt Caching
sidebar:
order: 80
---

Prompt caching is a feature that can reduce processing time and costs for repetitive prompts.
It is supported by various LLM providers, but the implementation may vary.

- OpenAI implements an automatic [cache prefix](https://openai.com/index/api-prompt-caching/).
- Anthropic supports settting [cache breakpoints](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)

## `ephemeral`

You can mark `def` section or `$` function with `cacheControl` set as `"ephemeral"` to enable prompt caching optimization. This essentially means that it
is acceptable for the LLM provider to cache the prompt for a short amount of time.

```js
def("FILE", env.files, { cacheControl: "ephemeral" })
```

```js
$`Some very cool prompt`.cacheControl("ephemeral")
```

## LLM provider supporet

Check warning on line 26 in docs/src/content/docs/reference/scripts/prompt-caching.mdx

View workflow job for this annotation

GitHub Actions / build

The section on `ephemeral` is duplicated between `context.md` and `prompt-caching.mdx`. This redundancy can be avoided by linking to the existing content or consolidating it in one place.
pelikhan marked this conversation as resolved.
Show resolved Hide resolved

In most cases, the `ephemeral` hint is ignored by LLM providers. However, the following are supported

### OpenAI, Azure OpenAI

[Prompt caching](https://platform.openai.com/docs/guides/prompt-caching) of the prompt prefix
is automatically enabled by OpenAI.

### Anthropic

- Anthropic: it is translated into a `'cache-control': { ... }` field in the message object

Check warning on line 37 in docs/src/content/docs/reference/scripts/prompt-caching.mdx

View workflow job for this annotation

GitHub Actions / build

The frontmatter in `prompt-caching.mdx` is not necessary since it's a standalone document and does not need to be included in the sidebar or have a title. This can be removed to simplify the file.

Check warning on line 37 in docs/src/content/docs/reference/scripts/prompt-caching.mdx

View workflow job for this annotation

GitHub Actions / build

The section on LLM provider support is duplicated between `context.md` and `prompt-caching.mdx`. This redundancy can be avoided by linking to the existing content or consolidating it in one place.
pelikhan marked this conversation as resolved.
Show resolved Hide resolved
pelikhan marked this conversation as resolved.
Show resolved Hide resolved
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frontmatter in prompt-caching.mdx is not necessary since it's a standalone document and does not need to be included in the sidebar or have a title. This can be removed to simplify the file.

generated by pr-docs-review-commit frontmatter

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section on LLM provider support is duplicated between context.md and prompt-caching.mdx. This redundancy can be avoided by linking to the existing content or consolidating it in one place.

generated by pr-docs-review-commit content-structure

364 changes: 188 additions & 176 deletions docs/yarn.lock

Large diffs are not rendered by default.

217 changes: 109 additions & 108 deletions packages/cli/package.json
Original file line number Diff line number Diff line change
@@ -1,110 +1,111 @@
{
"name": "genaiscript",
"version": "1.82.0",
"main": "built/genaiscript.cjs",
"type": "commonjs",
"bin": {
"genaiscript": "built/genaiscript.cjs"
},
"files": [
"built/genaiscript.cjs"
],
"publisher": "Microsoft",
"repository": {
"type": "git",
"url": "git+https://github.com/microsoft/genaiscript.git"
},
"homepage": "https://microsoft.github.io/genaiscript",
"keywords": [
"genai",
"ai",
"agentic",
"agent",
"cli",
"prompt",
"llm",
"generative ai",
"gpt4",
"chatgpt",
"ollama",
"llamacpp",
"chatgpt"
],
"description": "A CLI for GenAIScript, a generative AI scripting framework.",
"license": "MIT",
"dependencies": {
"@azure/identity": "^4.5.0",
"@inquirer/prompts": "^7.1.0",
"@modelcontextprotocol/sdk": "^1.0.3",
"@octokit/plugin-paginate-rest": "^11.3.6",
"@octokit/plugin-retry": "^7.1.2",
"@octokit/plugin-throttling": "^9.3.2",
"@octokit/rest": "^21.0.2",
"dockerode": "^4.0.2",
"gpt-tokenizer": "^2.7.0",
"html-to-text": "^9.0.5",
"jimp": "^1.6.0",
"mammoth": "^1.8.0",
"mathjs": "^14.0.0",
"tabletojson": "^4.1.5",
"tsx": "^4.19.2",
"turndown": "^7.2.0",
"turndown-plugin-gfm": "^1.0.2",
"typescript": "5.7.2",
"vectra": "^0.9.0",
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.2/xlsx-0.20.2.tgz"
},
"optionalDependencies": {
"@huggingface/transformers": "^3.1.1",
"@lvce-editor/ripgrep": "^1.4.0",
"pdfjs-dist": "4.9.124",
"playwright": "^1.49.0",
"skia-canvas": "^2.0.0",
"tree-sitter-wasms": "^0.1.11",
"web-tree-sitter": "0.22.2"
},
"engines": {
"node": ">=20.0.0"
},
"peerDependencies": {
"promptfoo": "0.100.3"
},
"devDependencies": {
"@types/diff": "^6.0.0",
"@types/dockerode": "^3.3.32",
"@types/fs-extra": "^11.0.4",
"@types/memorystream": "^0.3.4",
"@types/node": "^22.10.1",
"@types/papaparse": "^5.3.15",
"@types/prompts": "^2.4.9",
"@types/replace-ext": "^2.0.2",
"@types/ws": "^8.5.13",
"commander": "^12.1.0",
"diff": "^7.0.0",
"dotenv": "^16.4.7",
"es-toolkit": "^1.29.0",
"esbuild": "^0.24.0",
"execa": "^9.5.1",
"fs-extra": "^11.2.0",
"glob": "^11.0.0",
"memorystream": "^0.3.1",
"node-sarif-builder": "^3.2.0",
"octokit": "^4.0.2",
"openai": "^4.75.0",
"pretty-bytes": "^6.1.1",
"replace-ext": "^2.0.0",
"ws": "^8.18.0",
"zx": "^8.2.4"
},
"scripts": {
"compile": "esbuild src/main.ts --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk && node ../../scripts/patch-cli.mjs",
"compile-debug": "esbuild src/main.ts --sourcemap --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk",
"postcompile": "node built/genaiscript.cjs info help > ../../docs/src/content/docs/reference/cli/commands.md",
"vis:treemap": "npx --yes esbuild-visualizer --metadata esbuild.meta.json --filename esbuild.treemap.html",
"vis:network": "npx --yes esbuild-visualizer --metadata esbuild.meta.json --filename esbuild.network.html --template network",
"go": "yarn compile && node built/genaiscript.cjs",
"test": "node --import tsx --test src/**.test.ts",
"typecheck": "tsc -p src",
"lint": "npx --yes publint"
}
"name": "genaiscript",
"version": "1.82.0",
"main": "built/genaiscript.cjs",
"type": "commonjs",
"bin": {
"genaiscript": "built/genaiscript.cjs"
},
"files": [
"built/genaiscript.cjs"
],
"publisher": "Microsoft",
"repository": {
"type": "git",
"url": "git+https://github.com/microsoft/genaiscript.git"
},
"homepage": "https://microsoft.github.io/genaiscript",
"keywords": [
"genai",
"ai",
"agentic",
"agent",
"cli",
"prompt",
"llm",
"generative ai",
"gpt4",
"chatgpt",
"ollama",
"llamacpp",
"chatgpt"
],
"description": "A CLI for GenAIScript, a generative AI scripting framework.",
"license": "MIT",
"dependencies": {
"@anthropic-ai/sdk": "^0.32.1",
"@azure/identity": "^4.5.0",
"@inquirer/prompts": "^7.1.0",
"@modelcontextprotocol/sdk": "^1.0.3",
"@octokit/plugin-paginate-rest": "^11.3.6",
"@octokit/plugin-retry": "^7.1.2",
"@octokit/plugin-throttling": "^9.3.2",
"@octokit/rest": "^21.0.2",
"dockerode": "^4.0.2",
"gpt-tokenizer": "^2.7.0",
"html-to-text": "^9.0.5",
"jimp": "^1.6.0",
"mammoth": "^1.8.0",
"mathjs": "^14.0.0",
"tabletojson": "^4.1.5",
"tsx": "^4.19.2",
"turndown": "^7.2.0",
"turndown-plugin-gfm": "^1.0.2",
"typescript": "5.7.2",
"vectra": "^0.9.0",
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.2/xlsx-0.20.2.tgz"
},
"optionalDependencies": {
"@huggingface/transformers": "^3.1.2",
"@lvce-editor/ripgrep": "^1.4.0",
"pdfjs-dist": "4.9.124",
"playwright": "^1.49.0",
"skia-canvas": "^2.0.0",
"tree-sitter-wasms": "^0.1.11",
"web-tree-sitter": "0.22.2"
},
"engines": {
"node": ">=20.0.0"
},
"peerDependencies": {
"promptfoo": "0.100.3"
},
"devDependencies": {
"@types/diff": "^6.0.0",
"@types/dockerode": "^3.3.32",
"@types/fs-extra": "^11.0.4",
"@types/memorystream": "^0.3.4",
"@types/node": "^22.10.1",
"@types/papaparse": "^5.3.15",
"@types/prompts": "^2.4.9",
"@types/replace-ext": "^2.0.2",
"@types/ws": "^8.5.13",
"commander": "^12.1.0",
"diff": "^7.0.0",
"dotenv": "^16.4.7",
"es-toolkit": "^1.29.0",
"esbuild": "^0.24.0",
"execa": "^9.5.1",
"fs-extra": "^11.2.0",
"glob": "^11.0.0",
"memorystream": "^0.3.1",
"node-sarif-builder": "^3.2.0",
"octokit": "^4.0.2",
"openai": "^4.76.0",
"pretty-bytes": "^6.1.1",
"replace-ext": "^2.0.0",
"ws": "^8.18.0",
"zx": "^8.2.4"
},
"scripts": {
"compile": "esbuild src/main.ts --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk && node ../../scripts/patch-cli.mjs",
"compile-debug": "esbuild src/main.ts --sourcemap --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk",
"postcompile": "node built/genaiscript.cjs info help > ../../docs/src/content/docs/reference/cli/commands.md",
"vis:treemap": "npx --yes esbuild-visualizer --metadata esbuild.meta.json --filename esbuild.treemap.html",
"vis:network": "npx --yes esbuild-visualizer --metadata esbuild.meta.json --filename esbuild.network.html --template network",
"go": "yarn compile && node built/genaiscript.cjs",
"test": "node --import tsx --test src/**.test.ts",
"typecheck": "tsc -p src",
"lint": "npx --yes publint"
}
}
6 changes: 3 additions & 3 deletions packages/core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"@anthropic-ai/sdk": "^0.32.1",
"@azure/identity": "^4.5.0",
"@huggingface/jinja": "^0.3.2",
"@huggingface/transformers": "^3.1.1",
"@huggingface/transformers": "^3.1.2",
"@modelcontextprotocol/sdk": "^1.0.3",
"@octokit/plugin-paginate-rest": "^11.3.6",
"@octokit/plugin-retry": "^7.1.2",
Expand Down Expand Up @@ -54,7 +54,7 @@
"gpt-tokenizer": "^2.7.0",
"html-escaper": "^3.0.3",
"html-to-text": "^9.0.5",
"https-proxy-agent": "^7.0.5",
"https-proxy-agent": "^7.0.6",
"ignore": "^6.0.2",
"inflection": "^3.0.0",
"ini": "^5.0.0",
Expand All @@ -70,7 +70,7 @@
"minisearch": "^7.1.1",
"mustache": "^4.2.0",
"object-inspect": "^1.13.3",
"openai": "^4.75.0",
"openai": "^4.76.0",
"p-limit": "^6.1.0",
"parse-diff": "^0.11.1",
"prettier": "^3.4.2",
Expand Down
20 changes: 12 additions & 8 deletions packages/core/src/anthropic.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import { parseModelIdentifier } from "./models"
import { NotSupportedError, serializeError } from "./error"
import { estimateTokens } from "./tokens"
import { resolveTokenEncoder } from "./encoders"
import Anthropic from "@anthropic-ai/sdk"
import type { Anthropic } from "@anthropic-ai/sdk"

import {
ChatCompletionResponse,
Expand All @@ -22,6 +22,7 @@ import {
} from "./chattypes"

import { logError } from "./util"
import { resolveHttpProxyAgent } from "./proxy"

const convertFinishReason = (
stopReason: Anthropic.Message["stop_reason"]
Expand Down Expand Up @@ -63,7 +64,7 @@ const adjustUsage = (

const convertMessages = (
messages: ChatCompletionMessageParam[]
): Array<Anthropic.Messages.MessageParam> => {
): Array<Anthropic.Beta.PromptCaching.PromptCachingBetaMessageParam> => {
return messages.map(convertSingleMessage)
}

Expand Down Expand Up @@ -92,7 +93,7 @@ const convertSingleMessage = (

const convertToolCallMessage = (
msg: ChatCompletionAssistantMessageParam
): Anthropic.Messages.MessageParam => {
): Anthropic.Beta.PromptCaching.PromptCachingBetaMessageParam => {
return {
role: "assistant",
content: msg.tool_calls.map((tool) => ({
Expand All @@ -106,7 +107,7 @@ const convertToolCallMessage = (

const convertToolResultMessage = (
msg: ChatCompletionToolMessageParam
): Anthropic.Messages.MessageParam => {
): Anthropic.Beta.PromptCaching.PromptCachingBetaMessageParam => {
return {
role: "user",
content: [
Expand All @@ -124,7 +125,7 @@ const convertStandardMessage = (
| ChatCompletionSystemMessageParam
| ChatCompletionAssistantMessageParam
| ChatCompletionUserMessageParam
): Anthropic.Messages.MessageParam => {
): Anthropic.Beta.PromptCaching.PromptCachingBetaMessageParam => {
const role = msg.role === "assistant" ? "assistant" : "user"
if (Array.isArray(msg.content)) {
return {
Expand Down Expand Up @@ -158,7 +159,7 @@ const convertStandardMessage = (

const convertImageUrlBlock = (
block: ChatCompletionContentPartImage
): Anthropic.Messages.ImageBlockParam => {
): Anthropic.Beta.PromptCaching.PromptCachingBetaImageBlockParam => {
return {
type: "image",
source: {
Expand Down Expand Up @@ -199,9 +200,12 @@ export const AnthropicChatCompletion: ChatCompletionHandler = async (
const { model } = parseModelIdentifier(req.model)
const { encode: encoder } = await resolveTokenEncoder(model)

const { default: Anthropic } = await import("@anthropic-ai/sdk")
const httpAgent = resolveHttpProxyAgent()
const anthropic = new Anthropic({
baseURL: cfg.base,
apiKey: cfg.token,
httpAgent,
})

trace.itemValue(`url`, `[${anthropic.baseURL}](${anthropic.baseURL})`)
Expand All @@ -215,14 +219,14 @@ export const AnthropicChatCompletion: ChatCompletionHandler = async (
const toolCalls: ChatCompletionToolCall[] = []

try {
const stream = anthropic.messages.stream({
const stream = anthropic.beta.promptCaching.messages.stream({
model,
tools: convertTools(req.tools),
messages,
max_tokens: req.max_tokens || ANTHROPIC_MAX_TOKEN,
temperature: req.temperature,
top_p: req.top_p,
stream: true,
tools: convertTools(req.tools),
...headers,
})

Expand Down
Loading
Loading