Skip to content

Commit

Permalink
Enhance token handling with new utilities for counting and truncating…
Browse files Browse the repository at this point in the history
… logs
  • Loading branch information
pelikhan committed Oct 7, 2024
1 parent 11bfc09 commit e8c217b
Show file tree
Hide file tree
Showing 22 changed files with 518 additions and 5 deletions.
27 changes: 27 additions & 0 deletions docs/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 6 additions & 2 deletions docs/src/content/docs/reference/scripts/system.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1041,8 +1041,12 @@ defTool(
let log = await github.downloadWorkflowJobLog(job_id, {
llmify: true,
})
if (parsers.tokens(log) > 1000)
log = "...(truncated, tool long)...\n" + log.slice(-3000)
if ((await tokenizers.count(log)) > 1000) {
log = await tokenizers.truncate(log, 1000, { last: true })
const annotations = await parsers.annotations(log)
if (annotations.length > 0)
log += "\n\n" + YAML.stringify(annotations)
}

Check failure on line 1049 in docs/src/content/docs/reference/scripts/system.mdx

View workflow job for this annotation

GitHub Actions / build

The functions 'tokenizers.count' and 'tokenizers.truncate' are used incorrectly or do not exist. The correct usage should be verified.
return log
}
)
Expand Down
27 changes: 27 additions & 0 deletions genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions packages/auto/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions packages/core/src/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 6 additions & 2 deletions packages/core/src/genaisrc/system.github_actions.genai.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,12 @@ defTool(
let log = await github.downloadWorkflowJobLog(job_id, {
llmify: true,
})
if (parsers.tokens(log) > 1000)
log = "...(truncated, tool long)...\n" + log.slice(-3000)
if ((await tokenizers.count(log)) > 1000) {
log = await tokenizers.truncate(log, 1000, { last: true })
const annotations = await parsers.annotations(log)
if (annotations.length > 0)
log += "\n\n" + YAML.stringify(annotations)
}
return log
}
)
Expand Down
21 changes: 20 additions & 1 deletion packages/core/src/globals.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,13 @@ import { readText } from "./fs"
import { logVerbose } from "./util"
import { GitHubClient } from "./github"
import { GitClient } from "./git"
import { estimateTokens, truncateTextToTokens } from "./tokens"
import { resolveTokenEncoder } from "./encoders"
import { runtimeHost } from "./host"

/**
* This file defines global utilities and installs them into the global context.
* It includes functions to parse and stringify various data formats, handle errors,
* It includes functions to parse and stringify various data formats, handle errors,
* and manage GitHub and Git clients. The utilities are frozen to prevent modification.
*/

Expand Down Expand Up @@ -118,6 +121,22 @@ export function installGlobals() {
// Instantiate Git client
glb.git = new GitClient()

glb.tokenizers = Object.freeze<Tokenizers>({
count: async (text, options) => {
const encoder = await resolveTokenEncoder(
options?.model || runtimeHost.defaultModelOptions.model
)
const c = await estimateTokens(text, encoder)
return c
},
truncate: async (text, maxTokens, options) => {
const encoder = await resolveTokenEncoder(
options?.model || runtimeHost.defaultModelOptions.model
)
return await truncateTextToTokens(text, maxTokens, encoder, options)
},
})

/**
* Asynchronous function to fetch text from a URL or file.
* Handles both HTTP(S) URLs and local workspace files.
Expand Down
22 changes: 22 additions & 0 deletions packages/core/src/types/prompt_template.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1028,6 +1028,28 @@ interface CSVParseOptions {
headers?: string[]
}

interface Tokenizers {
/**
* Estimates the number of tokens in the content. May not be accurate
* @param model
* @param text
*/
count(text: string, options?: { model: string }): Promise<number>

/**
* Truncates the text to a given number of tokens, approximation.
* @param model
* @param text
* @param maxTokens
* @param options
*/
truncate(
text: string,
maxTokens: number,
options?: { model?: string; last?: boolean }
): Promise<string>
}

interface Parsers {
/**
* Parses text as a JSON5 payload
Expand Down
5 changes: 5 additions & 0 deletions packages/core/src/types/prompt_type.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,11 @@ declare var github: GitHub
*/
declare var git: Git

/**
* Computation around tokens
*/
declare var tokenizers: Tokenizers

/**
* Fetches a given URL and returns the response.
* @param url
Expand Down
27 changes: 27 additions & 0 deletions packages/sample/genaisrc/blog/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions packages/sample/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions packages/sample/genaisrc/node/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit e8c217b

Please sign in to comment.