Skip to content

Commit

Permalink
zod support (#963)
Browse files Browse the repository at this point in the history
  • Loading branch information
pelikhan authored Dec 21, 2024
1 parent a194e36 commit 1c906df
Show file tree
Hide file tree
Showing 11 changed files with 137 additions and 86 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ $`Analyze FILE and extract data to JSON using the ${schema} schema.`

### 📋 Data Schemas

Define, validate, and repair data using [schemas](https://microsoft.github.io/genaiscript/reference/scripts/schemas).
Define, validate, and repair data using [schemas](https://microsoft.github.io/genaiscript/reference/scripts/schemas). Zod support builtin.

```js
const data = defSchema("MY_DATA", { type: "array", items: { ... } })
Expand Down
157 changes: 88 additions & 69 deletions docs/src/content/docs/reference/scripts/schemas.mdx
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
---
title: Data Schemas
sidebar:
order: 6
order: 6
description: Learn how to define and use data schemas for structured output in
JSON/YAML with LLM, including validation and repair techniques.
JSON/YAML with LLM, including validation and repair techniques.
keywords: data schemas, JSON schema, YAML validation, LLM structured output,
schema repair
schema repair
genaiscript:
model: openai:gpt-3.5-turbo

model: openai:gpt-3.5-turbo
---
import { Card } from '@astrojs/starlight/components';

import { Card } from "@astrojs/starlight/components"

It is possible to force the LLM to generate data that conforms to a specific schema.
This technique works reasonably well and GenAIScript also provides automatic validation "just in case".
Expand All @@ -32,11 +31,17 @@ const schema = defSchema("CITY_SCHEMA", {
description: "A city with population and elevation information.",
properties: {
name: { type: "string", description: "The name of the city." },
population: { type: "number", description: "The population of the city." },
url: { type: "string", description: "The URL of the city's Wikipedia page." }
population: {
type: "number",
description: "The population of the city.",
},
url: {
type: "string",
description: "The URL of the city's Wikipedia page.",
},
},
required: ["name", "population", "url"]
}
required: ["name", "population", "url"],
},
})

$`Generate data using JSON compliant with ${schema}.`
Expand All @@ -47,9 +52,9 @@ $`Generate data using JSON compliant with ${schema}.`
<details open>
<summary>👤 user</summary>


````markdown wrap
CITY_SCHEMA:

```typescript-schema
// A list of cities with population and elevation information.
type CITY_SCHEMA = Array<{
Expand All @@ -61,46 +66,64 @@ type CITY_SCHEMA = Array<{
url: string,
}>
```

Generate data using JSON compliant with CITY_SCHEMA.
````


</details>


<details open>
<summary>🤖 assistant</summary>


````markdown wrap
File ./data.json:

```json schema=CITY_SCHEMA
[
{
"name": "New York",
"population": 8398748,
"url": "https://en.wikipedia.org/wiki/New_York_City"
},
{
"name": "Los Angeles",
"population": 3990456,
"url": "https://en.wikipedia.org/wiki/Los_Angeles"
},
{
"name": "Chicago",
"population": 2705994,
"url": "https://en.wikipedia.org/wiki/Chicago"
}
{
"name": "New York",
"population": 8398748,
"url": "https://en.wikipedia.org/wiki/New_York_City"
},
{
"name": "Los Angeles",
"population": 3990456,
"url": "https://en.wikipedia.org/wiki/Los_Angeles"
},
{
"name": "Chicago",
"population": 2705994,
"url": "https://en.wikipedia.org/wiki/Chicago"
}
]
```
````


</details>

{/* genaiscript output end */}

### Native zod support

A [Zod](https://zod.dev/) type can be passed in `defSchema` and it will be automatically converted to JSON schema.
The GenAIScript also exports the `z` object from Zod for convenience.

```js
// import from genaiscript
import { z } from "genaiscript/runtime"
// or directly from zod
// import { z } from "zod"
// create schema using zod
const CitySchema = z.array(
z.object({
name: z.string(),
population: z.number(),
url: z.string(),
})
)
// JSON schema to constrain the output of the tool.
const schema = defSchema("CITY_SCHEMA", CitySchema)
```

### Prompt encoding

Expand All @@ -111,12 +134,12 @@ from TypeChat, the schema is converted TypeScript types before being injected in
// A list of cities with population and elevation information.
type CITY_SCHEMA = Array<{
// The name of the city.
name: string,
name: string
// The population of the city.
population: number,
population: number
// The URL of the city's Wikipedia page.
url: string,
}>
url: string
}>
```
You can change this behavior by using the `{ format: "json" }` option.
Expand All @@ -134,50 +157,46 @@ in the output folder as well.
<details>
<summary>schema CITY_SCHEMA</summary>

- source:
- source:

```json
{
"type": "array",
"description": "A list of cities with population and elevation information.",
"items": {
"type": "object",
"description": "A city with population and elevation information.",
"properties": {
"name": {
"type": "string",
"description": "The name of the city."
},
"population": {
"type": "number",
"description": "The population of the city."
},
"url": {
"type": "string",
"description": "The URL of the city's Wikipedia page."
}
},
"required": [
"name",
"population",
"url"
]
}
"type": "array",
"description": "A list of cities with population and elevation information.",
"items": {
"type": "object",
"description": "A city with population and elevation information.",
"properties": {
"name": {
"type": "string",
"description": "The name of the city."
},
"population": {
"type": "number",
"description": "The population of the city."
},
"url": {
"type": "string",
"description": "The URL of the city's Wikipedia page."
}
},
"required": ["name", "population", "url"]
}
}
```
- prompt (rendered as typescript):

- prompt (rendered as typescript):

```ts
// A list of cities with population and elevation information.
type CITY_SCHEMA = Array<{
// The name of the city.
name: string,
name: string
// The population of the city.
population: number,
population: number
// The URL of the city's Wikipedia page.
url: string,
}>

url: string
}>
```
</details>
Expand All @@ -199,15 +218,15 @@ GenAIScript automatically validates the payload against the schema.

:::tip

Not all data formats are equal! Some data formats like JSON introduce ambiguity
Not all data formats are equal! Some data formats like JSON introduce ambiguity
and can confuse the LLM.
[Read more...](https://betterprogramming.pub/yaml-vs-json-which-is-more-efficient-for-language-models-5bc11dd0f6df).

:::

## Repair

GenAIScript will automatically try to repair the data by issues additional messages
GenAIScript will automatically try to repair the data by issues additional messages
back to the LLM with the parsing output.

## Runtime Validation
Expand All @@ -216,4 +235,4 @@ Use `parsers.validateJSON` to validate JSON when running the script.

```js
const validation = parsers.validateJSON(schema, json)
```
```
2 changes: 1 addition & 1 deletion packages/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ $`Analyze FILE and extract data to JSON using the ${schema} schema.`

### 📋 Data Schemas

Define, validate, and repair data using [schemas](https://microsoft.github.io/genaiscript/reference/scripts/schemas).
Define, validate, and repair data using [schemas](https://microsoft.github.io/genaiscript/reference/scripts/schemas). Zod support builtin.

```js
const data = defSchema("MY_DATA", { type: "array", items: { ... } })
Expand Down
10 changes: 6 additions & 4 deletions packages/cli/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,9 @@
"turndown-plugin-gfm": "^1.0.2",
"typescript": "5.7.2",
"vectra": "^0.9.0",
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.2/xlsx-0.20.2.tgz"
"xlsx": "https://cdn.sheetjs.com/xlsx-0.20.2/xlsx-0.20.2.tgz",
"zod": "^3.24.1",
"zod-to-json-schema": "^3.24.1"
},
"optionalDependencies": {
"@huggingface/transformers": "^3.2.1",
Expand Down Expand Up @@ -112,11 +114,11 @@
"zx": "^8.2.4"
},
"scripts": {
"compile:runtime": "tsc src/runtime.ts --skipLibCheck --outDir built --declaration --target es2020 --moduleResolution node && mv built/runtime.js built/runtime.mjs",
"compile:runtime": "tsc src/runtime.ts --skipLibCheck --outDir built --declaration --target es2020 --moduleResolution node --module esnext && mv built/runtime.js built/runtime.mjs",
"compile:api": "esbuild src/api.ts --outfile=built/api.mjs",
"compile:cli": "esbuild src/main.ts --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk --external:@anthropic-ai/bedrock-sdk --external:es-toolkit && node ../../scripts/patch-cli.mjs",
"compile:cli": "esbuild src/main.ts --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk --external:@anthropic-ai/bedrock-sdk --external:es-toolkit --external:zod --external:zod-to-json-schema && node ../../scripts/patch-cli.mjs",
"compile": "yarn compile:api && yarn compile:runtime && yarn compile:cli",
"compile-debug": "esbuild src/main.ts --sourcemap --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk --external:@anthropic-ai/bedrock-sdk --external:es-toolkit",
"compile-debug": "esbuild src/main.ts --sourcemap --metafile=./esbuild.meta.json --bundle --platform=node --target=node20 --outfile=built/genaiscript.cjs --external:tsx --external:esbuild --external:get-tsconfig --external:resolve-pkg-maps --external:dockerode --external:pdfjs-dist --external:web-tree-sitter --external:tree-sitter-wasms --external:promptfoo --external:typescript --external:@lvce-editor/ripgrep --external:gpt-3-encoder --external:mammoth --external:xlsx --external:mathjs --external:@azure/identity --external:gpt-tokenizer --external:playwright --external:@inquirer/prompts --external:jimp --external:turndown --external:turndown-plugin-gfm --external:vectra --external:tabletojson --external:html-to-text --external:@octokit/rest --external:@octokit/plugin-throttling --external:@octokit/plugin-retry --external:@octokit/plugin-paginate-rest --external:skia-canvas --external:@huggingface/transformers --external:@modelcontextprotocol/sdk --external:@anthropic-ai/sdk --external:@anthropic-ai/bedrock-sdk --external:es-toolkit --external:zod --external:zod-to-json-schema",
"postcompile": "node built/genaiscript.cjs info help > ../../docs/src/content/docs/reference/cli/commands.md",
"vis:treemap": "npx --yes esbuild-visualizer --metadata esbuild.meta.json --filename esbuild.treemap.html",
"vis:network": "npx --yes esbuild-visualizer --metadata esbuild.meta.json --filename esbuild.network.html --template network",
Expand Down
10 changes: 8 additions & 2 deletions packages/cli/src/runtime.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
/**
* GenAIScript supporting runtime
*/
import { delay as esDelay } from "es-toolkit"
import { delay as _delay } from "es-toolkit"
import { z as zod } from "zod"

/**
* A helper function to delay the execution of the script
*/
export const delay: (ms: number) => Promise<void> = esDelay
export const delay: (ms: number) => Promise<void> = _delay

/**
* Zod schema generator
*/
export const z = zod
5 changes: 4 additions & 1 deletion packages/core/src/promptdom.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ import { jinjaRenderChatMessage } from "./jinja"
import { runtimeHost } from "./host"
import { hash } from "./crypto"
import { startMcpServer } from "./mcp"
import { tryZodToJsonSchema } from "./zod"

// Definition of the PromptNode interface which is an essential part of the code structure.
export interface PromptNode extends ContextExpansionOptions {
Expand Down Expand Up @@ -361,11 +362,13 @@ export function createImageNode(
// Function to create a schema node.
export function createSchemaNode(
name: string,
value: JSONSchema,
value: JSONSchema | ZodTypeLike,
options?: DefSchemaOptions
): PromptSchemaNode {
assert(!!name)
assert(value !== undefined)
// auto zod conversion
value = tryZodToJsonSchema(value as ZodTypeLike) ?? (value as JSONSchema)
return { type: "schema", name, value, options }
}

Expand Down
4 changes: 3 additions & 1 deletion packages/core/src/types/prompt_template.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2604,10 +2604,12 @@ interface McpServerConfig {

type McpServersConfig = Record<string, Omit<McpServerConfig, "id" | "options">>

type ZodTypeLike = { _def: any, safeParse: any, refine: any }

interface ChatGenerationContext extends ChatTurnGenerationContext {
defSchema(
name: string,
schema: JSONSchema,
schema: JSONSchema | ZodTypeLike,
options?: DefSchemaOptions
): string
defImages(
Expand Down
2 changes: 1 addition & 1 deletion packages/core/src/types/prompt_type.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ declare function fetchText(
*/
declare function defSchema(
name: string,
schema: JSONSchema,
schema: JSONSchema | ZodTypeLike,
options?: DefSchemaOptions
): string

Expand Down
19 changes: 19 additions & 0 deletions packages/core/src/zod.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import { zodToJsonSchema as _zodToJsonSchema } from "zod-to-json-schema"

/**
* Converts a Zod schema to a JSON schema
* @param z
* @param options
* @returns
*/
export function tryZodToJsonSchema(
z: ZodTypeLike,
options?: object
): JSONSchema {
if (!z || !z._def || !z.refine || !z.safeParse) return undefined
const schema = _zodToJsonSchema(z as any, {
target: "openAi",
...(options || {}),
})
return structuredClone(schema) as JSONSchema
}
Loading

0 comments on commit 1c906df

Please sign in to comment.