-
Notifications
You must be signed in to change notification settings - Fork 127
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support for inline validation tests (#387)
* define test structure * helper * emit structured output in promptfoo * show in trace * check for tests * pass more options around * logging * twoards running promptfoo * bundling and testing * building test cli * still working thorugh path and cwd issues * more tuning of execution * test description * support for custom test provider * updated testgen * updated generated help * add test all options * support multiple models * better loggins * simpler log * added help on view * typos * parse, cleanup * removing progress message * added view option
- Loading branch information
Showing
31 changed files
with
1,587 additions
and
347 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
--- | ||
title: Testing scripts | ||
sidebar: | ||
order: 4.6 | ||
--- | ||
import providerSrc from "../../../../../packages/core/src/genaiscript-api-provider.mjs?raw" | ||
import { Code } from '@astrojs/starlight/components'; | ||
|
||
It is possible to declare tests and assertions in the `script` function | ||
to validate the output of the script. | ||
|
||
The tests are executed by [promptfoo](https://promptfoo.dev/). | ||
[promptfoo](https://promptfoo.dev/) is tool for evaluating LLM output quality. | ||
|
||
## Declaring tests | ||
|
||
The tests are added as an array of objects in the `tests` key of the `script` function. | ||
|
||
```js title="proofreader.genai.js" | ||
scripts({ | ||
..., | ||
tests: [{ | ||
files: "src/rag/testcode.ts", | ||
rubrics: "is a report with a list of issues", | ||
facts: ["The report says that the input string should be validated before use."] | ||
}] | ||
}) | ||
``` | ||
|
||
## Running tests | ||
|
||
You can use the cli to run the tests for one or more scripts. | ||
|
||
```sh | ||
npx genaiscript test proofreader | ||
``` | ||
|
||
If `script` is not provided, all scripts with tests will be tested. | ||
|
||
## Viewing results | ||
|
||
The results of the tests can be explored through the [promptfoo web ui](https://promptfoo.dev/docs/usage/web-ui). | ||
|
||
```sh | ||
npx promptfoo view | ||
``` | ||
|
||
## Known limitations | ||
|
||
Currently, promptfoo treats the script source as the prompt text. Therefore, one cannot use assertions | ||
that also rely on the input text, such as `answer_relevance`. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.