Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into azuredevopscomment
Browse files Browse the repository at this point in the history
  • Loading branch information
pelikhan committed Jul 16, 2024
2 parents 6cd9783 + 5f78023 commit 0958a16
Show file tree
Hide file tree
Showing 63 changed files with 3,000 additions and 9,443 deletions.
2 changes: 1 addition & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@
"features": {
"ghcr.io/devcontainers/features/docker-in-docker:2": {},
"ghcr.io/devcontainers/features/azure-cli:1.2.5": {},
"ghcr.io/devcontainers/features/python:1.6.1": {}
"ghcr.io/devcontainers/features/python:1.6.2": {}
}
}
7,506 changes: 1,702 additions & 5,804 deletions THIRD_PARTY_LICENSES.md

Large diffs are not rendered by default.

63 changes: 30 additions & 33 deletions demo/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

63 changes: 30 additions & 33 deletions docs/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 1 addition & 35 deletions docs/src/content/docs/reference/cli/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Options:
-mdr, --max-data-repairs <number> maximum data repairs
-mtc, --max-tool-calls <number> maximum tool calls for the run
-se, --seed <number> seed for the run
-em, --embeddings-model <string> embeddings model for the run
--no-cache disable LLM result cache
-cn, --cache-name <name> custom cache file name
--cs, --csv-separator <string> csv separator (default: "\t")
Expand Down Expand Up @@ -222,35 +223,13 @@ Options:
-h, --help display help for command
Commands:
index [options] <file...> Index a set of documents
search [options] <query> [files...] Search using vector embeddings
similarity
clear [options] Clear index to force re-indexing
fuzz [options] <query> [files...] Search using string distance
code
help [command] display help for command
```

### `retrieval index`

```
Usage: genaiscript retrieval index [options] <file...>
Index a set of documents
Arguments:
file Files to index
Options:
-ef, --excluded-files <string...> excluded files
-n, --name <string> index name
-cs, --chunk-size <number> chunk size
-co, --chunk-overlap <number> chunk overlap
-m, --model <string> model for embeddings
-t, --temperature <number> LLM temperature
-h, --help display help for command
```

### `retrieval search`

```
Expand All @@ -261,22 +240,9 @@ Search using vector embeddings similarity
Options:
-ef, --excluded-files <string...> excluded files
-tk, --top-k <number> maximum number of results
-n, --name <string> index name
-h, --help display help for command
```

### `retrieval clear`

```
Usage: genaiscript retrieval clear [options]
Clear index to force re-indexing
Options:
-n, --name <string> index name
-h, --help display help for command
```

### `retrieval fuzz`

```
Expand Down
8 changes: 8 additions & 0 deletions docs/src/content/docs/reference/scripts/files.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,14 @@ const content = file.content

It will automatically convert PDFs and DOCX files to text.

### `readJSON`

Reads the content of a file as JSON (using a [JSON5](https://json5.org/) parser)

```ts
const data = await workspace.readJSON("data.json")
```

### `writeText`

Writes text to a file, relative to the workspace root.
Expand Down
31 changes: 6 additions & 25 deletions docs/src/content/docs/reference/scripts/retreival.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,11 @@ description: Learn how to use GenAIScript's retrieval utilities for content sear
keywords: RAG, content retrieval, search augmentation, indexing, web search
---

GenAIScript provides various utilities to retrieve content and augment the prompt. This technique is typically referred as **RAG** (Retrieval-Augmentation-Generation) in the literature. GenAIScript uses [llamaindex-ts](https://ts.llamaindex.ai/api/classes/VectorIndexRetriever) which supports many vector database vendors.

## Fuzz Search

The `retrieve.fuzzSearch` performs a "traditional" fuzzy search to find the most similar documents to the prompt.

```js
const { files } = await retrieval.fuzzSearch("cat dog", env.files)
```
GenAIScript provides various utilities to retrieve content and augment the prompt. This technique is typically referred as **RAG** (Retrieval-Augmentation-Generation) in the literature.

## Vector Search

GenAIScript provides tiny vector database based on [vectra](https://www.npmjs.com/package/vectra).
The `retrieve.vectorSearch` performs a embeddings search to find the most similar documents to the prompt.

```js
Expand All @@ -27,26 +20,14 @@ def("RAG", files)

The `files` variable contains a list of files, with concatenated fragments, that are most similar to the prompt. The `fragments` variable contains a list of fragments from the files that are most similar to the prompt.

### Indexing

By default, the retrieval uses [OpenAI text-embedding-ada-002](https://ts.llamaindex.ai/modules/embeddings/) embeddings. The first search might be slow as the files get indexed for the first time.
## Fuzz Search

You can index your project using the [CLI](/genaiscript/reference/cli).
The `retrieve.fuzzSearch` performs a "traditional" fuzzy search to find the most similar documents to the prompt.

```sh
genaiscript retrieve index "src/**"
```js
const files = await retrieval.fuzzSearch("cat dog", env.files)
```

:::tip

You can simulate an indexing command in Visual Studio Code by right-clicking on a folder and selecting **Retrieval** > **Index**. Once indexed, you can test search using **Retrieval** > **Search**.

:::

### Indexing configuration

You can control the chunk size, overlap and model used for index files. You can also create multiple indexes using the `indexName` option.

## Web Search

The `retrieval.webSearch` performs a web search using a search engine API. You will need to provide API keys for the search engine you want to use.
Expand Down
Loading

0 comments on commit 0958a16

Please sign in to comment.