Skip to content

Commit

Permalink
more docs
Browse files Browse the repository at this point in the history
  • Loading branch information
pelikhan committed Apr 29, 2024
1 parent 7b4fc1f commit b4a276d
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 9 deletions.
122 changes: 115 additions & 7 deletions docs/src/content/docs/reference/scripts/tests.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,17 @@ keywords: promptfoo, LLM testing, output quality, language model evaluation, scr
---

It is possible to define tests for the LLM scripts, to evaluate the output quality of the LLM
over time and ModuleResolutionKind.
over time and model types.

The tests are executed by [promptfoo](https://promptfoo.dev/), a tool
for evaluating LLM output quality.

## Defining tests

The tests are declared in the `script` function in your test. You may define one or many tests (array).
The tests are declared in the `script` function in your test.
You may define one or many tests (array).

```js title="proofreader.genai.js" wrap
```js title="proofreader.genai.js" wrap "tests"
scripts({
...,
tests: [{
Expand All @@ -28,11 +29,118 @@ scripts({
})
```

### rubrics
### `files`

### facts
`files` takes a list of file path (relative to the workspace) and populate the `env.files`
variable while running the test. You can provide multiple files by passing an array of strings.

### Assertions and metrics
```js title="proofreader.genai.js" wrap "files"
scripts({
tests: {
files: "src/rag/testcode.ts",
...
}
})
```

### `rubrics`

`rubrics` checks if the LLM output matches given requirements,
using a language model to grade the output based on the rubric (see [llm-rubric](https://promptfoo.dev/docs/configuration/expected-outputs/model-graded/#examples-output-based)).
You can specify multiple rubrics by passing an array of strings.

```js title="proofreader.genai.js" wrap "rubrics"
scripts({
tests: {
rubrics: "is a report with a list of issues",
...,
}
})
```

### `facts`

`facts` checks a factual consistency (see [factuality](https://promptfoo.dev/docs/guides/factuality-eval/)).
You can specify multiple facts by passing an array of strings.

> given a completion A and reference answer B evaluates
> whether A is a subset of B, A is a superset of B, A and B are equivalent,
> A and B disagree, or A and B differ,
> but difference don't matter from the perspective of factuality.
```js title="proofreader.genai.js" wrap "facts"
scripts({
tests: {
facts: `The report says that the input string should be validated before use.`,
...,
}
})
```

The assertions of tests are based on
### `asserts`

Other assertions on
[promptfoo assertions and metrics](https://promptfoo.dev/docs/configuration/expected-outputs/).

- `icontains` (`not-icontains"`)
- `equals` (`not-equals`)
- `starts-with` (`not-starts-with`)

```js title="proofreader.genai.js" wrap "asserts"
scripts({
tests: {
facts: `The report says that the input string should be validated before use.`,
asserts: [
{
type: "icontains",
value: "issue",
},
],
},
})
```

- `contains-all` (`not-contains-all`)
- `contains-any` (`not-contains-any`)
- `icontains-all` (`not-icontains-all`)

```js title="proofreader.genai.js" wrap "asserts"
scripts({
tests: {
facts: `The report says that the input string should be validated before use.`,
asserts: [
{
type: "icontains-all",
value: ["issue", "fix"],
},
],
},
})
```

## Running tests

You can run tests from Visual Studio Code or using the [command line](/genaiscript/reference/cli).
In both cases, genaiscript generates a [promptfoo configuration file](https://promptfoo.dev/docs/configuration/guide)
and execute promptfoo on it.

### Visual Studio Code

- Open the script to test
- Right click in the editor and select **Run GenAIScript Tests** in the context menu
- The [promptfoo web view](https://promptfoo.dev/docs/usage/web-ui/) will automatically
open and refresh with the test results.

### Command line

Run the `test` command with the script file as argument.

```sh "test"
npx genaiscript test <scriptid>
```

You can specify additional models to test against by passing the `--models` option.

```sh '--models "ollama:phi3"'
npx genaiscript test <scriptid> --models "ollama:phi3"
```
2 changes: 0 additions & 2 deletions packages/core/src/types/prompt_template.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,6 @@ type PromptAssertion = {
| "not-equals"
| "starts-with"
| "not-starts-with"
| "llm-rubric"
| "factuality"
// The expected value
value: string
}
Expand Down

0 comments on commit b4a276d

Please sign in to comment.