Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add visionModel support #866

Merged
merged 2 commits into from
Nov 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/src/components/BuiltinTools.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,5 @@ import { LinkCard } from '@astrojs/starlight/components';
<LinkCard title="user_input_confirm" description="Ask the user to confirm a message." href="/genaiscript/reference/scripts/system#systemuser_input" />
<LinkCard title="user_input_select" description="Ask the user to select an option." href="/genaiscript/reference/scripts/system#systemuser_input" />
<LinkCard title="user_input_text" description="Ask the user to input text." href="/genaiscript/reference/scripts/system#systemuser_input" />
<LinkCard title="vision_ask_image" description="Use vision model to run a query on an image" href="/genaiscript/reference/scripts/system#systemvision_ask_image" />

5 changes: 3 additions & 2 deletions docs/src/content/docs/reference/cli/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@
Runs a GenAIScript against files.

Options:
-m, --model <string> model for the run
-sm, --small-model <string> small model for the run
-m, --model <string> 'large' model alias (default)
-sm, --small-model <string> 'small' alias model
-vm, --vision-model <string> 'vision' alias model

Check notice on line 21 in docs/src/content/docs/reference/cli/commands.md

View workflow job for this annotation

GitHub Actions / build

The use of aliases for model options ('large', 'small', 'vision') should be clearly explained in the documentation to ensure users understand their purpose and how they differ from each other.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of 'large', 'small', and 'vision' as model aliases could benefit from additional clarification or context to ensure users understand what these terms specifically refer to in the context of the tool.

generated by pr-docs-review-commit alias_clarity

-lp, --logprobs enable reporting token probabilities
-tlp, --top-logprobs <number> number of top logprobs (1 to 5)
-ef, --excluded-files <string...> excluded files
Expand Down
66 changes: 66 additions & 0 deletions docs/src/content/docs/reference/scripts/system.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3350,6 +3350,72 @@ defTool(
`````


### `system.vision_ask_image`

Vision Ask Image

Register tool that uses vision model to run a query on an image

- tool `vision_ask_image`: Use vision model to run a query on an image

`````js wrap title="system.vision_ask_image"
system({
title: "Vision Ask Image",
description:
"Register tool that uses vision model to run a query on an image",
})

defTool(
"vision_ask_image",
"Use vision model to run a query on an image",
{
type: "object",
properties: {
image: {
type: "string",
description: "Image URL or workspace relative filepath",
},
query: {
type: "string",
description: "Query to run on the image",
},
hd: {
type: "boolean",
description: "Use high definition image",
},
},
required: ["image", "query"],
},
async (args) => {
const { image, query, hd } = args
const res = await runPrompt(
(_) => {
_.defImages(image, {
autoCrop: true,
detail: hd ? "high" : "low",
maxWidth: hd ? 1024 : 512,
maxHeight: hd ? 1024 : 512,
})
_.$`Answer this query about the images:`
_.def("QUERY", query)
},
{
model: "vision",
system: [
"system",
"system.assistant",
"system.safety_jailbreak",
"system.safety_harmful_content",
],
}
)
return res
}
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description for the system.vision_ask_image tool could be expanded to provide more context or examples of usage, which would enhance user understanding of its functionality.

generated by pr-docs-review-commit tool_description


`````


### `system.zero_shot_cot`

Zero-shot Chain Of Though
Expand Down
5 changes: 3 additions & 2 deletions packages/cli/src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,9 @@ export async function cli() {
.command("run")
.description("Runs a GenAIScript against files.")
.arguments("<script> [files...]")
.option("-m, --model <string>", "model for the run")
.option("-sm, --small-model <string>", "small model for the run")
.option("-m, --model <string>", "'large' model alias (default)")
.option("-sm, --small-model <string>", "'small' alias model")
.option("-vm, --vision-model <string>", "'vision' alias model")
.option("-lp, --logprobs", "enable reporting token probabilities")
.option(
"-tlp, --top-logprobs <number>",
Expand Down
2 changes: 2 additions & 0 deletions packages/cli/src/nodehost.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import {
AZURE_AI_INFERENCE_TOKEN_SCOPES,
MODEL_PROVIDER_AZURE_SERVERLESS_OPENAI,
DOT_ENV_FILENAME,
DEFAULT_VISION_MODEL,
} from "../../core/src/constants"
import { tryReadText } from "../../core/src/fs"
import {
Expand Down Expand Up @@ -141,6 +142,7 @@ export class NodeHost implements RuntimeHost {
readonly defaultModelOptions = {
model: DEFAULT_MODEL,
smallModel: DEFAULT_SMALL_MODEL,
visionModel: DEFAULT_VISION_MODEL,
temperature: DEFAULT_TEMPERATURE,
}
readonly defaultEmbeddingsModelOptions = {
Expand Down
2 changes: 2 additions & 0 deletions packages/cli/src/run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,8 @@ export async function runScript(
if (options.model) host.defaultModelOptions.model = options.model
if (options.smallModel)
host.defaultModelOptions.smallModel = options.smallModel
if (options.visionModel)
host.defaultModelOptions.visionModel = options.visionModel

const fail = (msg: string, exitCode: number, url?: string) => {
logError(url ? `${msg} (see ${url})` : msg)
Expand Down
5 changes: 5 additions & 0 deletions packages/cli/src/test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ function parseModelSpec(m: string): ModelOptions {
return {
model: values["m"],
smallModel: values["s"],
visionModel: values["v"],
temperature: normalizeFloat(values["t"]),
topP: normalizeFloat(values["p"]),
}
Expand Down Expand Up @@ -120,11 +121,14 @@ export async function runPromptScriptTests(
testDelay?: string
model?: string
smallModel?: string
visionModel?: string
}
): Promise<PromptScriptTestRunResponse> {
if (options.model) host.defaultModelOptions.model = options.model
if (options.smallModel)
host.defaultModelOptions.smallModel = options.smallModel
if (options.visionModel)
host.defaultModelOptions.visionModel = options.visionModel

const scripts = await listTests({ ids, ...(options || {}) })
if (!scripts.length)
Expand Down Expand Up @@ -163,6 +167,7 @@ export async function runPromptScriptTests(
cli,
model: info.model,
smallModel: info.smallModel,
visionModel: info.visionModel,
models: options.models?.map(parseModelSpec),
provider: "provider.mjs",
testProvider,
Expand Down
15 changes: 10 additions & 5 deletions packages/core/src/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -345,11 +345,12 @@
logWarn(
`tool: ${tool.spec.name} response too long (${toolContentTokens} tokens), truncating ${maxToolContentTokens} tokens`
)
toolContent = truncateTextToTokens(
toolContent,
maxToolContentTokens,
encoder
) + "... (truncated)"
toolContent =
truncateTextToTokens(

Check failure on line 349 in packages/core/src/chat.ts

View workflow job for this annotation

GitHub Actions / build

The assignment of `toolContent` to itself is unnecessary and can be removed. It seems like a formatting issue.
pelikhan marked this conversation as resolved.
Show resolved Hide resolved
toolContent,
maxToolContentTokens,
encoder
) + "... (truncated)"
}
trace.fence(toolContent, "markdown")
toolResult.push(toolContent)
Expand Down Expand Up @@ -735,8 +736,8 @@
): GenerationOptions {
const res = {
...options,
...(runOptions || {}),
model:

Check failure on line 740 in packages/core/src/chat.ts

View workflow job for this annotation

GitHub Actions / build

The `toChatCompletionUserMessage` function is called with an empty string as the message content. This might lead to unintended empty messages being sent.
runOptions?.model ??
options?.model ??
host.defaultModelOptions.model,
Expand All @@ -744,6 +745,10 @@
runOptions?.smallModel ??
options?.smallModel ??
host.defaultModelOptions.smallModel,
visionModel:
runOptions?.visionModel ??
options?.visionModel ??
host.defaultModelOptions.visionModel,
temperature:
runOptions?.temperature ?? host.defaultModelOptions.temperature,
embeddingsModel:
Expand Down
3 changes: 3 additions & 0 deletions packages/core/src/connection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ export async function parseDefaultsFromEnv(env: Record<string, string>) {
if (env.GENAISCRIPT_DEFAULT_SMALL_MODEL)
host.defaultModelOptions.smallModel =
env.GENAISCRIPT_DEFAULT_SMALL_MODEL
if (env.GENAISCRIPT_DEFAULT_VISION_MODEL)
host.defaultModelOptions.visionModel =
env.GENAISCRIPT_DEFAULT_VISION_MODEL
const t = normalizeFloat(env.GENAISCRIPT_DEFAULT_TEMPERATURE)
if (!isNaN(t)) host.defaultModelOptions.temperature = t
if (env.GENAISCRIPT_DEFAULT_EMBEDDINGS_MODEL)
Expand Down
9 changes: 9 additions & 0 deletions packages/core/src/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ export const RETRIEVAL_PERSIST_DIR = "retrieval"
export const HIGHLIGHT_LENGTH = 4000
export const SMALL_MODEL_ID = "small"
export const LARGE_MODEL_ID = "large"
export const VISION_MODEL_ID = "vision"
export const DEFAULT_MODEL = "openai:gpt-4o"
export const DEFAULT_MODEL_CANDIDATES = [
"azure:gpt-4o",
Expand All @@ -62,6 +63,14 @@ export const DEFAULT_MODEL_CANDIDATES = [
"github:gpt-4o",
"client:gpt-4",
]
export const DEFAULT_VISION_MODEL = "openai:gpt-4o"
export const DEFAULT_VISION_MODEL_CANDIDATES = [
"azure:gpt-4o",
"azure_serverless:gpt-4o",
DEFAULT_MODEL,
"anthropic:claude-2",
"github:gpt-4o",
]
export const DEFAULT_SMALL_MODEL = "openai:gpt-4o-mini"
export const DEFAULT_SMALL_MODEL_CANDIDATES = [
"azure:gpt-4o-mini",
Expand Down
14 changes: 12 additions & 2 deletions packages/core/src/genaiscript-api-provider.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,17 @@ class GenAIScriptApiProvider {
}

async callApi(prompt, context) {
const { model, smallModel, temperature, top_p, cache, version, cli, quiet } =
this.config
const {
model,
smallModel,
visionModel,
temperature,
top_p,
cache,
version,
cli,
quiet,
} = this.config
const { vars, logger } = context
try {
let files = vars.files // string or string[]
Expand Down Expand Up @@ -52,6 +61,7 @@ class GenAIScriptApiProvider {
if (quiet) args.push("--quiet")
if (model) args.push("--model", model)
if (smallModel) args.push("--small-model", smallModel)
if (visionModel) args.push("--vision-model", visionModel)
if (temperature !== undefined)
args.push("--temperature", temperature)
if (top_p !== undefined) args.push("--top_p", top_p)
Expand Down
53 changes: 53 additions & 0 deletions packages/core/src/genaisrc/system.vision_ask_image.genai.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
system({
title: "Vision Ask Image",
description:
"Register tool that uses vision model to run a query on an image",
})

defTool(
"vision_ask_image",
"Use vision model to run a query on an image",
{
type: "object",
properties: {
image: {
type: "string",
description: "Image URL or workspace relative filepath",
},
query: {
type: "string",
description: "Query to run on the image",
},
hd: {
type: "boolean",
description: "Use high definition image",
},
},
required: ["image", "query"],
},
async (args) => {
const { image, query, hd } = args
const res = await runPrompt(
(_) => {
_.defImages(image, {
autoCrop: true,
detail: hd ? "high" : "low",
maxWidth: hd ? 1024 : 512,
maxHeight: hd ? 1024 : 512,
})
_.$`Answer this query about the images:`
_.def("QUERY", query)
},
{
model: "vision",
system: [
"system",
"system.assistant",
"system.safety_jailbreak",
"system.safety_harmful_content",
],
}
)
return res
}
)
2 changes: 1 addition & 1 deletion packages/core/src/host.ts
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ export interface Host {

// read a secret from the environment or a .env file
defaultModelOptions: Required<
Pick<ModelOptions, "model" | "smallModel" | "temperature">
Pick<ModelOptions, "model" | "smallModel" | "visionModel" | "temperature">
>
defaultEmbeddingsModelOptions: Required<
Pick<EmbeddingsModelOptions, "embeddingsModel">
Expand Down
4 changes: 3 additions & 1 deletion packages/core/src/image.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@ export async function imageEncodeForLLM(
) {
// Dynamically import the Jimp library and its alignment enums
const { Jimp, HorizontalAlign, VerticalAlign } = await import("jimp")
const { autoCrop, maxHeight, maxWidth } = options
let { autoCrop, maxHeight, maxWidth } = options

// If the URL is a string, resolve it to a data URI
if (typeof url === "string") url = await resolveFileDataUri(url)

// https://platform.openai.com/docs/guides/vision/calculating-costs#managing-images

// Return the URL if no image processing is required
if (
typeof url === "string" &&
Expand Down
8 changes: 8 additions & 0 deletions packages/core/src/models.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@ import {
DEFAULT_EMBEDDINGS_MODEL_CANDIDATES,
DEFAULT_MODEL_CANDIDATES,
DEFAULT_SMALL_MODEL_CANDIDATES,
DEFAULT_VISION_MODEL_CANDIDATES,
LARGE_MODEL_ID,
MODEL_PROVIDER_LLAMAFILE,
MODEL_PROVIDER_OPENAI,
SMALL_MODEL_ID,
VISION_MODEL_ID,
} from "./constants"
import { errorMessage } from "./error"
import { LanguageModelConfiguration, host } from "./host"
Expand Down Expand Up @@ -107,6 +109,12 @@ export async function resolveModelConnectionInfo(
host.defaultModelOptions.smallModel,
...DEFAULT_SMALL_MODEL_CANDIDATES,
]
} else if (m === VISION_MODEL_ID) {
m = undefined
candidates ??= [
host.defaultModelOptions.visionModel,
...DEFAULT_VISION_MODEL_CANDIDATES,
]
} else if (m === LARGE_MODEL_ID) {
m = undefined
candidates ??= [
Expand Down
Loading