Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zod support #963

Merged
merged 10 commits into from
Dec 21, 2024
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ $`Analyze FILE and extract data to JSON using the ${schema} schema.`

### 📋 Data Schemas

Define, validate, and repair data using [schemas](https://microsoft.github.io/genaiscript/reference/scripts/schemas).
Define, validate, and repair data using [schemas](https://microsoft.github.io/genaiscript/reference/scripts/schemas). Zod support builtin.

```js
const data = defSchema("MY_DATA", { type: "array", items: { ... } })
Expand Down
157 changes: 88 additions & 69 deletions docs/src/content/docs/reference/scripts/schemas.mdx
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
---
title: Data Schemas
sidebar:
order: 6
order: 6
description: Learn how to define and use data schemas for structured output in
JSON/YAML with LLM, including validation and repair techniques.
JSON/YAML with LLM, including validation and repair techniques.
keywords: data schemas, JSON schema, YAML validation, LLM structured output,
schema repair
schema repair
genaiscript:
model: openai:gpt-3.5-turbo

model: openai:gpt-3.5-turbo
---
import { Card } from '@astrojs/starlight/components';

import { Card } from "@astrojs/starlight/components"

It is possible to force the LLM to generate data that conforms to a specific schema.
This technique works reasonably well and GenAIScript also provides automatic validation "just in case".
Expand All @@ -32,11 +31,17 @@ const schema = defSchema("CITY_SCHEMA", {
description: "A city with population and elevation information.",
properties: {
name: { type: "string", description: "The name of the city." },
population: { type: "number", description: "The population of the city." },
url: { type: "string", description: "The URL of the city's Wikipedia page." }
population: {
type: "number",
description: "The population of the city.",
},
url: {
type: "string",
description: "The URL of the city's Wikipedia page.",
},
},
required: ["name", "population", "url"]
}
required: ["name", "population", "url"],
},
})

$`Generate data using JSON compliant with ${schema}.`
Expand All @@ -47,9 +52,9 @@ $`Generate data using JSON compliant with ${schema}.`
<details open>
<summary>👤 user</summary>


````markdown wrap
CITY_SCHEMA:

```typescript-schema
// A list of cities with population and elevation information.
type CITY_SCHEMA = Array<{
Expand All @@ -61,46 +66,64 @@ type CITY_SCHEMA = Array<{
url: string,
}>
```

Generate data using JSON compliant with CITY_SCHEMA.
````


</details>


<details open>
<summary>🤖 assistant</summary>


````markdown wrap
File ./data.json:

```json schema=CITY_SCHEMA
[
{
"name": "New York",
"population": 8398748,
"url": "https://en.wikipedia.org/wiki/New_York_City"
},
{
"name": "Los Angeles",
"population": 3990456,
"url": "https://en.wikipedia.org/wiki/Los_Angeles"
},
{
"name": "Chicago",
"population": 2705994,
"url": "https://en.wikipedia.org/wiki/Chicago"
}
{
"name": "New York",
"population": 8398748,
"url": "https://en.wikipedia.org/wiki/New_York_City"
},
{
"name": "Los Angeles",
"population": 3990456,
"url": "https://en.wikipedia.org/wiki/Los_Angeles"
},
{
"name": "Chicago",
"population": 2705994,
"url": "https://en.wikipedia.org/wiki/Chicago"
}
]
```
````


</details>

{/* genaiscript output end */}

### Native zod support

A [Zod](https://zod.dev/) type can be passed in `defSchema` and it will be automatically converted to JSON schema.
The GenAIScript also exports the `z` object from Zod for convenience.

```js
// import from genaiscript
import { z } from "genaiscript/runtime"
// or directly from zod
// import { z } from "zod"
// create schema using zod
const CitySchema = z.array(
z.object({
name: z.string(),
population: z.number(),
url: z.string(),
})
)
// JSON schema to constrain the output of the tool.
const schema = defSchema("CITY_SCHEMA", CitySchema)
```

### Prompt encoding

Expand All @@ -111,12 +134,12 @@ from TypeChat, the schema is converted TypeScript types before being injected in
// A list of cities with population and elevation information.
type CITY_SCHEMA = Array<{
// The name of the city.
name: string,
name: string
// The population of the city.
population: number,
population: number
// The URL of the city's Wikipedia page.
url: string,
}>
url: string
}>
```

You can change this behavior by using the `{ format: "json" }` option.
Expand All @@ -134,50 +157,46 @@ in the output folder as well.
<details>
<summary>schema CITY_SCHEMA</summary>

- source:
- source:

```json
{
"type": "array",
"description": "A list of cities with population and elevation information.",
"items": {
"type": "object",
"description": "A city with population and elevation information.",
"properties": {
"name": {
"type": "string",
"description": "The name of the city."
},
"population": {
"type": "number",
"description": "The population of the city."
},
"url": {
"type": "string",
"description": "The URL of the city's Wikipedia page."
}
},
"required": [
"name",
"population",
"url"
]
}
"type": "array",
"description": "A list of cities with population and elevation information.",
"items": {
"type": "object",
"description": "A city with population and elevation information.",
"properties": {
"name": {
"type": "string",
"description": "The name of the city."
},
"population": {
"type": "number",
"description": "The population of the city."
},
"url": {
"type": "string",
"description": "The URL of the city's Wikipedia page."
}
},
"required": ["name", "population", "url"]
}
}
```
- prompt (rendered as typescript):

- prompt (rendered as typescript):

```ts
// A list of cities with population and elevation information.
type CITY_SCHEMA = Array<{
// The name of the city.
name: string,
name: string
// The population of the city.
population: number,
population: number
// The URL of the city's Wikipedia page.
url: string,
}>

url: string
}>
```

</details>
Expand All @@ -199,15 +218,15 @@ GenAIScript automatically validates the payload against the schema.

:::tip

Not all data formats are equal! Some data formats like JSON introduce ambiguity
Not all data formats are equal! Some data formats like JSON introduce ambiguity
and can confuse the LLM.
[Read more...](https://betterprogramming.pub/yaml-vs-json-which-is-more-efficient-for-language-models-5bc11dd0f6df).

:::

## Repair

GenAIScript will automatically try to repair the data by issues additional messages
GenAIScript will automatically try to repair the data by issues additional messages
back to the LLM with the parsing output.

## Runtime Validation
Expand All @@ -216,4 +235,4 @@ Use `parsers.validateJSON` to validate JSON when running the script.

```js
const validation = parsers.validateJSON(schema, json)
```
```
2 changes: 1 addition & 1 deletion packages/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ $`Analyze FILE and extract data to JSON using the ${schema} schema.`

### 📋 Data Schemas

Define, validate, and repair data using [schemas](https://microsoft.github.io/genaiscript/reference/scripts/schemas).
Define, validate, and repair data using [schemas](https://microsoft.github.io/genaiscript/reference/scripts/schemas). Zod support builtin.

```js
const data = defSchema("MY_DATA", { type: "array", items: { ... } })
Expand Down
10 changes: 6 additions & 4 deletions packages/cli/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,9 @@
"turndown-plugin-gfm": "^1.0.2",
"typescript": "5.7.2",
"vectra": "^0.9.0",
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.2/xlsx-0.20.2.tgz"
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.2/xlsx-0.20.2.tgz",
"zod": "^3.24.1",
"zod-to-json-schema": "^3.24.1"
},
"optionalDependencies": {
"@huggingface/transformers": "^3.2.1",
Expand Down Expand Up @@ -112,11 +114,11 @@
"zx": "^8.2.4"
},
"scripts": {
"compile:runtime": "tsc src/runtime.ts --skipLibCheck --outDir built --declaration --target es2020 --moduleResolution node && mv built/runtime.js built/runtime.mjs",
"compile:runtime": "tsc src/runtime.ts --skipLibCheck --outDir built --declaration --target es2020 --moduleResolution node --module esnext && mv built/runtime.js built/runtime.mjs",
"compile:api": "esbuild src/api.ts --outfile=built/api.mjs",
"compile:cli": "esbuild src/main.ts --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk --external:@anthropic-ai/bedrock-sdk --external:es-toolkit && node ../../scripts/patch-cli.mjs",
"compile:cli": "esbuild src/main.ts --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk --external:@anthropic-ai/bedrock-sdk --external:es-toolkit --external:zod --external:zod-to-json-schema && node ../../scripts/patch-cli.mjs",
"compile": "yarn compile:api && yarn compile:runtime && yarn compile:cli",
"compile-debug": "esbuild src/main.ts --sourcemap --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk --external:@anthropic-ai/bedrock-sdk --external:es-toolkit",
"compile-debug": "esbuild src/main.ts --sourcemap --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk --external:@anthropic-ai/bedrock-sdk --external:es-toolkit --external:zod --external:zod-to-json-schema",
"postcompile": "node built/genaiscript.cjs info help > ../../docs/src/content/docs/reference/cli/commands.md",
"vis:treemap": "npx --yes esbuild-visualizer --metadata esbuild.meta.json --filename esbuild.treemap.html",
"vis:network": "npx --yes esbuild-visualizer --metadata esbuild.meta.json --filename esbuild.network.html --template network",
Expand Down
10 changes: 8 additions & 2 deletions packages/cli/src/runtime.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
/**
* GenAIScript supporting runtime
*/
import { delay as esDelay } from "es-toolkit"
import { delay as _delay } from "es-toolkit"
import { z as zod } from "zod"

/**
* A helper function to delay the execution of the script
*/
export const delay: (ms: number) => Promise<void> = esDelay
export const delay: (ms: number) => Promise<void> = _delay

/**
* Zod schema generator
*/
export const z = zod
5 changes: 4 additions & 1 deletion packages/core/src/promptdom.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ import { jinjaRenderChatMessage } from "./jinja"
import { runtimeHost } from "./host"
import { hash } from "./crypto"
import { startMcpServer } from "./mcp"
import { tryZodToJsonSchema } from "./zod"

// Definition of the PromptNode interface which is an essential part of the code structure.
export interface PromptNode extends ContextExpansionOptions {
Expand Down Expand Up @@ -361,11 +362,13 @@ export function createImageNode(
// Function to create a schema node.
export function createSchemaNode(
name: string,
value: JSONSchema,
value: JSONSchema | ZodTypeLike,
options?: DefSchemaOptions
): PromptSchemaNode {
assert(!!name)
assert(value !== undefined)
// auto zod conversion
value = tryZodToJsonSchema(value as ZodTypeLike) ?? (value as JSONSchema)
return { type: "schema", name, value, options }
}

Expand Down
4 changes: 3 additions & 1 deletion packages/core/src/types/prompt_template.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2604,10 +2604,12 @@ interface McpServerConfig {

type McpServersConfig = Record<string, Omit<McpServerConfig, "id" | "options">>

type ZodTypeLike = { _def: any, safeParse: any, refine: any }

interface ChatGenerationContext extends ChatTurnGenerationContext {
defSchema(
name: string,
pelikhan marked this conversation as resolved.
Show resolved Hide resolved
schema: JSONSchema,
schema: JSONSchema | ZodTypeLike,
options?: DefSchemaOptions
): string
defImages(
Expand Down
2 changes: 1 addition & 1 deletion packages/core/src/types/prompt_type.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ declare function fetchText(
*/
declare function defSchema(
name: string,
pelikhan marked this conversation as resolved.
Show resolved Hide resolved
schema: JSONSchema,
schema: JSONSchema | ZodTypeLike,
options?: DefSchemaOptions
): string

Expand Down
19 changes: 19 additions & 0 deletions packages/core/src/zod.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import { zodToJsonSchema as _zodToJsonSchema } from "zod-to-json-schema"

/**
* Converts a Zod schema to a JSON schema
* @param z
* @param options
* @returns
*/
export function tryZodToJsonSchema(
z: ZodTypeLike,
options?: object
): JSONSchema {
if (!z || !z._def || !z.refine || !z.safeParse) return undefined
const schema = _zodToJsonSchema(z as any, {
target: "openAi",
...(options || {}),
})
return structuredClone(schema) as JSONSchema
}
Loading
Loading