Skip to content

Commit

Permalink
Add search-and-transform guide and enhance transformation caching in …
Browse files Browse the repository at this point in the history
…script
  • Loading branch information
pelikhan committed Sep 18, 2024
1 parent 48beaa3 commit 6b07e94
Show file tree
Hide file tree
Showing 2 changed files with 125 additions and 1 deletion.
115 changes: 115 additions & 0 deletions docs/src/content/docs/guides/search-and-transform.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
title: Search And Transform
description: Learn how to search and transform data in your data sources.
sidebar:
order: 20
---

import { Code } from "@astrojs/starlight/components"
import source from "../../../../../packages/vscode/genaisrc/st.genai.mjs?raw"

This script is an evoluation of the "search and replace" feature from text editor,
where the "replace" step has been replaced by a LLM transformation.

It can be useful to batch apply text transformations that are not easily done with
regular expressions.

For example, when GenAIScript added the ability to use a string command string in
the `exec` command, we needed to convert all script using `host.exec("cmd", ["arg0", "arg1", "arg2"])`
to `host.exec(``cmd arg0 arg1 arg2``)`.

```sh wrap
genaiscript st --vars pattern='host\.exec\s*\([^,]+,\s*\[[^\]]+\]\s*\)' transform='Convert the call to a single string command shell in TypeScript'
```

Here are some example of the transformations where the LLM correctly handled variables.

- concatenate the arguments of a function call into a single string

```diff wrap
- const { stdout } = await host.exec("git", ["diff"])
+ const { stdout } = await host.exec(`git diff`)
```

- concatenate the arguments and use the `${}` syntax to interpolate variables

```diff wrap
- const { stdout: commits } = await host.exec("git", [
- "log",
- "--author",
- author,
- "--until",
- until,
- "--format=oneline",
- ])
+ const { stdout: commits } = await host.exec(`git log --author ${author} --until ${until} --format=oneline`)
```

## Search

The search step is done with the [workspace.grep](/genaiscript/reference/scripts/files)
that allows to efficiently search for a pattern in files (this is the same search engine
that powers the Visual Studio Code search).

```js "workspace.grep"
const { pattern, glob } = env.vars
const patternRx = new RegExp(pattern, "g")
const { files } = await workspace.grep(patternRx, glob)
```

## Compute Transforms

The second step is to apply the regular expression to the file content
and pre-compute the LLM transformation of each match using an [inline prompt](/genaiscript/reference/scripts/inline-prompts).

```js
const { transform } = env.vars
...
const patches = {} // map of match -> transformed
for (const file of files) {
const { content } = await workspace.readText(file.filename)
for (const match of content.matchAll(patternRx)) {
const res = await runPrompt(
(ctx) => {
ctx.$`
## Task
Your task is to transform the MATCH with the following TRANSFORM.
Return the transformed text.
- do NOT add enclosing quotes.
## Context
`
ctx.def("MATCHED", match[0])
ctx.def("TRANSFORM", transform)
},
{ label: match[0], system: [], cache: "search-and-transform" }
)
...
```
Since the LLM sometimes decides to wrap the answer in quotes, we need to remove them.
```js
...
const transformed = res.fences?.[0].content ?? res.text
patches[match[0]] = transformed
```
## Transform
Finally, with the transforms pre-computed, we apply a final regex replace to
patch the old file content with the transformed strings.
```js
const newContent = content.replace(
patternRx,
(match) => patches[match] ?? match
)
await workspace.writeText(file.filename, newContent)
}
```
## Full source
<Code code={source} wrap={true} lang="ts" title="st.genai.mts" />
11 changes: 10 additions & 1 deletion packages/vscode/genaisrc/st.genai.mts
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,16 @@ const patternRx = new RegExp(pattern, "g")
if (!transform) cancel("transform is missing")

const { files } = await workspace.grep(patternRx, glob)
// cached computed transformations
const patches = {}
for (const file of files) {
console.log(file.filename)
const { content } = await workspace.readText(file.filename)
const patches = {}

// skip binary files
if (!content) continue

// compute transforms
for (const match of content.matchAll(patternRx)) {
console.log(` ${match[0]}`)
if (patches[match[0]]) continue
Expand All @@ -56,10 +62,13 @@ for (const file of files) {
console.log(` ${match[0]} -> ${transformed ?? "?"}`)
}

// apply transforms
const newContent = content.replace(
patternRx,
(match) => patches[match] ?? match
)

// save results if file content is modified
if (content !== newContent)
await workspace.writeText(file.filename, newContent)
}

0 comments on commit 6b07e94

Please sign in to comment.