Merge remote-tracking branch 'origin/main' into azuredevopscomment

microsoft · Jul 16, 2024 · 0958a16 · 0958a16
2 parents 6cd9783 + 5f78023
commit 0958a16
Show file tree

Hide file tree

Showing 63 changed files with 3,000 additions and 9,443 deletions.
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -20,6 +20,6 @@
     "features": {
         "ghcr.io/devcontainers/features/docker-in-docker:2": {},
         "ghcr.io/devcontainers/features/azure-cli:1.2.5": {},
-        "ghcr.io/devcontainers/features/python:1.6.1": {}
+        "ghcr.io/devcontainers/features/python:1.6.2": {}
     }
 }
diff --git a/THIRD_PARTY_LICENSES.md b/THIRD_PARTY_LICENSES.md
diff --git a/demo/genaisrc/genaiscript.d.ts b/demo/genaisrc/genaiscript.d.ts
diff --git a/docs/genaisrc/genaiscript.d.ts b/docs/genaisrc/genaiscript.d.ts
diff --git a/docs/src/content/docs/reference/cli/commands.md b/docs/src/content/docs/reference/cli/commands.md
@@ -41,6 +41,7 @@ Options:
   -mdr, --max-data-repairs <number>          maximum data repairs
   -mtc, --max-tool-calls <number>            maximum tool calls for the run
   -se, --seed <number>                       seed for the run
+  -em, --embeddings-model <string>           embeddings model for the run
   --no-cache                                 disable LLM result cache
   -cn, --cache-name <name>                   custom cache file name
   --cs, --csv-separator <string>             csv separator (default: "\t")
@@ -222,35 +223,13 @@ Options:
   -h, --help                           display help for command
 
 Commands:
-  index [options] <file...>            Index a set of documents
   search [options] <query> [files...]  Search using vector embeddings
                                        similarity
-  clear [options]                      Clear index to force re-indexing
   fuzz [options] <query> [files...]    Search using string distance
   code
   help [command]                       display help for command
 ```
 
-### `retrieval index`
-
-```
-Usage: genaiscript retrieval index [options] <file...>
-
-Index a set of documents
-
-Arguments:
-  file                               Files to index
-
-Options:
-  -ef, --excluded-files <string...>  excluded files
-  -n, --name <string>                index name
-  -cs, --chunk-size <number>         chunk size
-  -co, --chunk-overlap <number>      chunk overlap
-  -m, --model <string>               model for embeddings
-  -t, --temperature <number>         LLM temperature
-  -h, --help                         display help for command
-```
-
 ### `retrieval search`
 
 ```
@@ -261,22 +240,9 @@ Search using vector embeddings similarity
 Options:
   -ef, --excluded-files <string...>  excluded files
   -tk, --top-k <number>              maximum number of results
-  -n, --name <string>                index name
   -h, --help                         display help for command
 ```
 
-### `retrieval clear`
-
-```
-Usage: genaiscript retrieval clear [options]
-
-Clear index to force re-indexing
-
-Options:
-  -n, --name <string>  index name
-  -h, --help           display help for command
-```
-
 ### `retrieval fuzz`
 
 ```

diff --git a/docs/src/content/docs/reference/scripts/files.md b/docs/src/content/docs/reference/scripts/files.md
@@ -71,6 +71,14 @@ const content = file.content
 
 It will automatically convert PDFs and DOCX files to text.
 
+### `readJSON`
+
+Reads the content of a file as JSON (using a [JSON5](https://json5.org/) parser)
+
+```ts
+const data = await workspace.readJSON("data.json")
+```
+
 ### `writeText`
 
 Writes text to a file, relative to the workspace root.

diff --git a/docs/src/content/docs/reference/scripts/retreival.md b/docs/src/content/docs/reference/scripts/retreival.md
@@ -6,18 +6,11 @@ description: Learn how to use GenAIScript's retrieval utilities for content sear
 keywords: RAG, content retrieval, search augmentation, indexing, web search
 ---
 
-GenAIScript provides various utilities to retrieve content and augment the prompt. This technique is typically referred as **RAG** (Retrieval-Augmentation-Generation) in the literature. GenAIScript uses [llamaindex-ts](https://ts.llamaindex.ai/api/classes/VectorIndexRetriever) which supports many vector database vendors.
-
-## Fuzz Search
-
-The `retrieve.fuzzSearch` performs a "traditional" fuzzy search to find the most similar documents to the prompt.
-
-```js
-const { files } = await retrieval.fuzzSearch("cat dog", env.files)
-```
+GenAIScript provides various utilities to retrieve content and augment the prompt. This technique is typically referred as **RAG** (Retrieval-Augmentation-Generation) in the literature.
 
 ## Vector Search
 
+GenAIScript provides tiny vector database based on [vectra](https://www.npmjs.com/package/vectra).
 The `retrieve.vectorSearch` performs a embeddings search to find the most similar documents to the prompt.
 
 ```js
@@ -27,26 +20,14 @@ def("RAG", files)
 
 The `files` variable contains a list of files, with concatenated fragments, that are most similar to the prompt. The `fragments` variable contains a list of fragments from the files that are most similar to the prompt.
 
-### Indexing
-
-By default, the retrieval uses [OpenAI text-embedding-ada-002](https://ts.llamaindex.ai/modules/embeddings/) embeddings. The first search might be slow as the files get indexed for the first time.
+## Fuzz Search
 
-You can index your project using the [CLI](/genaiscript/reference/cli).
+The `retrieve.fuzzSearch` performs a "traditional" fuzzy search to find the most similar documents to the prompt.
 
-```sh
-genaiscript retrieve index "src/**"
+```js
+const files = await retrieval.fuzzSearch("cat dog", env.files)
 ```
 
-:::tip
-
-You can simulate an indexing command in Visual Studio Code by right-clicking on a folder and selecting **Retrieval** > **Index**. Once indexed, you can test search using **Retrieval** > **Search**.
-
-:::
-
-### Indexing configuration
-
-You can control the chunk size, overlap and model used for index files. You can also create multiple indexes using the `indexName` option.
-
 ## Web Search
 
 The `retrieval.webSearch` performs a web search using a search engine API. You will need to provide API keys for the search engine you want to use.