-
Notifications
You must be signed in to change notification settings - Fork 127
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add search-and-transform guide and enhance transformation caching in …
…script
- Loading branch information
Showing
2 changed files
with
125 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
--- | ||
title: Search And Transform | ||
description: Learn how to search and transform data in your data sources. | ||
sidebar: | ||
order: 20 | ||
--- | ||
|
||
import { Code } from "@astrojs/starlight/components" | ||
import source from "../../../../../packages/vscode/genaisrc/st.genai.mjs?raw" | ||
|
||
This script is an evoluation of the "search and replace" feature from text editor, | ||
where the "replace" step has been replaced by a LLM transformation. | ||
|
||
It can be useful to batch apply text transformations that are not easily done with | ||
regular expressions. | ||
|
||
For example, when GenAIScript added the ability to use a string command string in | ||
the `exec` command, we needed to convert all script using `host.exec("cmd", ["arg0", "arg1", "arg2"])` | ||
to `host.exec(``cmd arg0 arg1 arg2``)`. | ||
|
||
```sh wrap | ||
genaiscript st --vars pattern='host\.exec\s*\([^,]+,\s*\[[^\]]+\]\s*\)' transform='Convert the call to a single string command shell in TypeScript' | ||
``` | ||
|
||
Here are some example of the transformations where the LLM correctly handled variables. | ||
|
||
- concatenate the arguments of a function call into a single string | ||
|
||
```diff wrap | ||
- const { stdout } = await host.exec("git", ["diff"]) | ||
+ const { stdout } = await host.exec(`git diff`) | ||
``` | ||
|
||
- concatenate the arguments and use the `${}` syntax to interpolate variables | ||
|
||
```diff wrap | ||
- const { stdout: commits } = await host.exec("git", [ | ||
- "log", | ||
- "--author", | ||
- author, | ||
- "--until", | ||
- until, | ||
- "--format=oneline", | ||
- ]) | ||
+ const { stdout: commits } = await host.exec(`git log --author ${author} --until ${until} --format=oneline`) | ||
``` | ||
|
||
## Search | ||
|
||
The search step is done with the [workspace.grep](/genaiscript/reference/scripts/files) | ||
that allows to efficiently search for a pattern in files (this is the same search engine | ||
that powers the Visual Studio Code search). | ||
|
||
```js "workspace.grep" | ||
const { pattern, glob } = env.vars | ||
const patternRx = new RegExp(pattern, "g") | ||
const { files } = await workspace.grep(patternRx, glob) | ||
``` | ||
|
||
## Compute Transforms | ||
|
||
The second step is to apply the regular expression to the file content | ||
and pre-compute the LLM transformation of each match using an [inline prompt](/genaiscript/reference/scripts/inline-prompts). | ||
|
||
```js | ||
const { transform } = env.vars | ||
... | ||
const patches = {} // map of match -> transformed | ||
for (const file of files) { | ||
const { content } = await workspace.readText(file.filename) | ||
for (const match of content.matchAll(patternRx)) { | ||
const res = await runPrompt( | ||
(ctx) => { | ||
ctx.$` | ||
## Task | ||
Your task is to transform the MATCH with the following TRANSFORM. | ||
Return the transformed text. | ||
- do NOT add enclosing quotes. | ||
## Context | ||
` | ||
ctx.def("MATCHED", match[0]) | ||
ctx.def("TRANSFORM", transform) | ||
}, | ||
{ label: match[0], system: [], cache: "search-and-transform" } | ||
) | ||
... | ||
``` | ||
Since the LLM sometimes decides to wrap the answer in quotes, we need to remove them. | ||
```js | ||
... | ||
const transformed = res.fences?.[0].content ?? res.text | ||
patches[match[0]] = transformed | ||
``` | ||
## Transform | ||
Finally, with the transforms pre-computed, we apply a final regex replace to | ||
patch the old file content with the transformed strings. | ||
```js | ||
const newContent = content.replace( | ||
patternRx, | ||
(match) => patches[match] ?? match | ||
) | ||
await workspace.writeText(file.filename, newContent) | ||
} | ||
``` | ||
## Full source | ||
<Code code={source} wrap={true} lang="ts" title="st.genai.mts" /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters