Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file format eval #780

Merged
merged 6 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 44 additions & 16 deletions docs/src/content/docs/reference/scripts/system.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -505,11 +505,12 @@ Generate changelog formatter edits
`````js wrap title="system.changelog"
system({
title: "Generate changelog formatter edits",
lineNumbers: true
lineNumbers: true,
})

$`## CHANGELOG file format

$`For partial updates of files, return one or more ChangeLogs (CLs) formatted as follows. Each CL must contain
For partial updates of large files, return one or more ChangeLogs (CLs) formatted as follows. Each CL must contain
one or more code snippet changes for a single file. There can be multiple CLs for a single file.
Each CL must start with a description of its changes. The CL must then list one or more pairs of
(OriginalCode, ChangedCode) code snippets. In each such pair, OriginalCode must list all consecutive
Expand All @@ -519,7 +520,8 @@ lines of code (again including the same few lines before and after the changes).
OriginalCode and ChangedCode must start at the same source code line number N. Each listed code line,
in both the OriginalCode and ChangedCode snippets, must be prefixed with [N] that matches the line
index N in the above snippets, and then be prefixed with exactly the same whitespace indentation as
the original snippets above. See also the following examples of the expected response format.
the original snippets above. Each OriginalCode must be paired with ChangedCode. Do NOT add multiple ChangedCode per OriginalCode.
See also the following examples of the expected response format.

CHANGELOG:
\`\`\`\`\`changelog
Expand Down Expand Up @@ -553,7 +555,14 @@ OriginalCode@23-23:
ChangedCode@23-23:
[23] <white space> <changed code line>
\`\`\`\`\`

## Choosing what file format to use

- If the file content is small (< 20 lines), use the full FULL format.
- If the file content is large (> 50 lines), use CHANGELOG format.
- If the file content IS VERY LARGE, ALWAYS USE CHANGELOG to save tokens.
`

`````


Expand Down Expand Up @@ -590,7 +599,7 @@ system({

$`## DIFF file format

The DIFF format should be used to generate diff changes on large files:
The DIFF format should be used to generate diff changes on large files with small number of changes:

- existing lines must start with their original line number: [<line number>] <line>
- deleted lines MUST start with - followed by the line number: - [<line number>] <deleted line>
Expand All @@ -608,7 +617,7 @@ The DIFF format should be used to generate diff changes on large files:
- only emit a couple unmodified lines before and after the changes
- keep the diffs AS SMALL AS POSSIBLE
- when reading files, ask for line numbers
- minimize the number of unmodified lines
- minimize the number of unmodified lines. DO NOT EMIT MORE THEN 2 UNMODIFIED LINES BEFORE AND AFTER THE CHANGES. Otherwise use the FILE file format.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "DO NOT EMIT MORE THEN 2 UNMODIFIED LINES" should be "DO NOT EMIT MORE THAN 2 UNMODIFIED LINES" to correct the grammatical error.

generated by pr-docs-review-commit wording

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the documentation, "THEN" should be "THAN".

generated by pr-docs-review-commit typo


- do NOT generate diff for files that have no changes
- do NOT emit diff if lines are the same
Expand All @@ -623,36 +632,40 @@ The DIFF format should be used to generate diff changes on large files:
FOLLOW THE SYNTAX PRECISLY. THIS IS IMPORTANT.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the documentation, "PRECISLY" should be "PRECISELY".

generated by pr-docs-review-commit typo

DIFF ./file.ts:
\`\`\`diff
[original line number] <2 lines before changes (not the whole file)>
[original line number] line before changes
- [original line number] <deleted line>
+ <added line>
[original line number] <2 lines after changes (not the whole file)>
[original line number] line after changes
\`\`\`

DIFF ./file2.ts:
\`\`\`diff
[original line number] <2 lines before changes (not the whole file)>
[original line number] line before changes
- [original line number] <deleted line>
- [original line number] <delete line 2>
+ <added line>
+ <added line 2>
[original line number] <2 lines after changes (not the whole file)>
[original line number] line after changes
\`\`\`

DIFF ./file3.ts:
\`\`\`diff
[original line number] <2 lines before changes (not the whole file)>
[original line number] line before changes
+ <added line>
[original line number] <2 lines after changes (not the whole file)>
[original line number] line after changes
\`\`\`

DIFF ./file4.ts:
\`\`\`diff
[original line number] <2 lines before changes (not the whole file)>
[original line number] line before changes
- [original line number] <deleted line>
[original line number] <2 lines after changes (not the whole file)>
[original line number] line after changes
\`\`\`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples provided for the DIFF format are incorrect. They should include the actual line numbers and not placeholders like "[original line number]". The examples should be updated to reflect the actual usage of the DIFF format.

generated by pr-docs-review-commit example_format


## Choosing what file format to use

- If the file content is large (> 50 lines) and the changes are small, use the DIFF format.
- In all other cases, use the FILE file format.
`

`````
Expand Down Expand Up @@ -688,9 +701,9 @@ system({
})

const folder = env.vars["outputFolder"] || "."
$`## Files
$`## FILE file format

When generating or updating files you will use the following syntax:`
When generating or updating files you should use the FILE file syntax preferrably:`

def(`File ${folder}/file1.ts`, `What goes in\n${folder}/file1.ts.`, {
language: "typescript",
Expand All @@ -705,9 +718,24 @@ def(`File /path_to_file/file2.md`, `What goes in\n/path_to_file/file2.md.`, {
language: "markdown",
})

$`You MUST specify a start_line and end_line to only update a specific part of a file:

FILE ${folder}/file1.py:
\`\`\`python start_line=15 end_line=20
Replace line range 15-20 in \n${folder}/file1.py
\`\`\`

FILE ${folder}/file1.py:
\`\`\`python start_line=30 end_line=35
Replace line range 30-35 in \n${folder}/file1.py
\`\`\`

`

$`- Make sure to use precisely \`\`\` to guard file code sections.
- Always sure to use precisely \`\`\`\`\` to guard file markdown sections.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions for guarding file code sections are inconsistent. The first bullet point says to use precisely "```" while the second says to use precisely "````". This should be clarified for consistency.

generated by pr-docs-review-commit consistency

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the documentation, "Always sure" should be "Always be sure".

generated by pr-docs-review-commit typo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the documentation, "guard file markdown sections" should be "guard markdown file sections".

generated by pr-docs-review-commit typo

- Use full path of filename in code section header.`
- Use full path of filename in code section header.
- Use start_line, end_line for large files with small updates`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "Always sure to use precisely" is grammatically incorrect. It should be "Always be sure to use precisely" to correct the sentence structure.

generated by pr-docs-review-commit wording

if (folder !== ".")
$`When generating new files, place files in folder "${folder}".`
$`- If a file does not have changes, do not regenerate.
Expand Down
38 changes: 38 additions & 0 deletions packages/core/src/changelog.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -241,4 +241,42 @@ ChangedCode@46-48:
console.log(res)
assert.equal(res[0].filename, "packages/core/src/cancellation.ts")
})

test("missing header", () => {
const source = `
ChangeLog:1@src/edits/su/fib.ts
Description: Implement the Fibonacci function and remove comments and empty lines.
OriginalCode@105-107:
[105] // TODO: implement fibonacci algorithm
[106] return 0 // BODY
[107] }
ChangedCode@105-107:
[105] if (n <= 1) return n;
[106] return fibonacci(n - 1) + fibonacci(n - 2);
[107] }
`
const res = parseChangeLogs(source)
console.log(res)
assert.equal(res[0].filename, "src/edits/su/fib.ts")
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case "missing header" does not contain any assertions. This could lead to false positives where the test passes but the functionality is not working as expected.

generated by pr-review-commit missing_test_assertion


test("unbalancred fences", () => {
const source = `\`\`\`\`\`changelog
ChangeLog:1@src/edits/bigfibs/fib.py
Description: Implemented new_function, removed comments and empty lines.
OriginalCode@48-51:
[48] def new_function(sum):
[49] # TODO
[50] return 0
[51] # return (10 - (sum % 10)) % 10;
ChangedCode@48-50:
[48] def new_function(sum):
[49] return (10 - (sum % 10)) % 10
[50]
\`\`\`
`
const res = parseChangeLogs(source)
console.log(res)
assert.equal(res[0].filename, "src/edits/bigfibs/fib.py")
})
})
28 changes: 23 additions & 5 deletions packages/core/src/changelog.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
* A changelog describes changes between original and modified code segments.
*/

import { unfence } from "./fence"

// Represents a chunk of code with a start and end line and its content.
export interface ChangeLogChunk {
start: number // Starting line number
Expand Down Expand Up @@ -31,19 +33,26 @@ export interface ChangeLog {
* @returns An array of parsed ChangeLog objects.
*/
export function parseChangeLogs(source: string): ChangeLog[] {
const lines = source.split("\n")
const lines = unfence(source, "changelog").split("\n")
const changelogs: ChangeLog[] = []

// Process each line to extract changelog information.
while (lines.length) {
if (!lines[0].trim()) {
lines.shift()
continue // Skip empty lines
continue
}

// each back ticks
if (/^[\`\.]{3,}/.test(lines[0])) {
lines.shift()
continue
}

// Parse the ChangeLog header line.
let m = /^ChangeLog:\s*(?<index>\d+)@(?<file>.*)$/i.exec(lines[0])
if (!m) throw new Error("missing ChangeLog header in " + lines[0])
let m = /^ChangeLog:\s*(?<index>\d+)@(?<file>.*)\s*$/i.exec(lines[0])
if (!m)
throw new Error("missing ChangeLog header in |" + lines[0] + "|")
const changelog: ChangeLog = {
index: parseInt(m.groups.index),
filename: m.groups.file.trim(),
Expand All @@ -66,6 +75,14 @@ export function parseChangeLogs(source: string): ChangeLog[] {
lines.shift()
continue
}

// each back ticks
if (/^[\`\.]{3,}/.test(lines[0])) {
// somehow we have finished this changed
lines.shift()
continue
}

// Attempt to parse a change.
const change = parseChange()
if (change) changelog.changes.push(change)
Expand All @@ -89,7 +106,8 @@ export function parseChangeLogs(source: string): ChangeLog[] {

lines.shift()
const changed = parseChunk(m)
return <ChangeLogChange>{ original, changed }
const res = <ChangeLogChange>{ original, changed }
return res
}

// Parses a chunk of code from the changelog.
Expand Down
16 changes: 12 additions & 4 deletions packages/core/src/genaisrc/system.changelog.genai.js
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
system({
title: "Generate changelog formatter edits",
lineNumbers: true
lineNumbers: true,
})

$`## CHANGELOG file format

$`For partial updates of files, return one or more ChangeLogs (CLs) formatted as follows. Each CL must contain
For partial updates of large files, return one or more ChangeLogs (CLs) formatted as follows. Each CL must contain
one or more code snippet changes for a single file. There can be multiple CLs for a single file.
Each CL must start with a description of its changes. The CL must then list one or more pairs of
(OriginalCode, ChangedCode) code snippets. In each such pair, OriginalCode must list all consecutive
Expand All @@ -14,7 +15,8 @@ lines of code (again including the same few lines before and after the changes).
OriginalCode and ChangedCode must start at the same source code line number N. Each listed code line,
in both the OriginalCode and ChangedCode snippets, must be prefixed with [N] that matches the line
index N in the above snippets, and then be prefixed with exactly the same whitespace indentation as
the original snippets above. See also the following examples of the expected response format.
the original snippets above. Each OriginalCode must be paired with ChangedCode. Do NOT add multiple ChangedCode per OriginalCode.
See also the following examples of the expected response format.

CHANGELOG:
\`\`\`\`\`changelog
Expand Down Expand Up @@ -48,4 +50,10 @@ OriginalCode@23-23:
ChangedCode@23-23:
[23] <white space> <changed code line>
\`\`\`\`\`
`

## Choosing what file format to use

- If the file content is small (< 20 lines), use the full FULL format.
- If the file content is large (> 50 lines), use CHANGELOG format.
- If the file content IS VERY LARGE, ALWAYS USE CHANGELOG to save tokens.
`
24 changes: 14 additions & 10 deletions packages/core/src/genaisrc/system.diff.genai.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ system({

$`## DIFF file format

The DIFF format should be used to generate diff changes on large files:
The DIFF format should be used to generate diff changes on large files with small number of changes:

- existing lines must start with their original line number: [<line number>] <line>
- deleted lines MUST start with - followed by the line number: - [<line number>] <deleted line>
Expand All @@ -23,7 +23,7 @@ The DIFF format should be used to generate diff changes on large files:
- only emit a couple unmodified lines before and after the changes
- keep the diffs AS SMALL AS POSSIBLE
- when reading files, ask for line numbers
- minimize the number of unmodified lines
- minimize the number of unmodified lines. DO NOT EMIT MORE THEN 2 UNMODIFIED LINES BEFORE AND AFTER THE CHANGES. Otherwise use the FILE file format.

- do NOT generate diff for files that have no changes
- do NOT emit diff if lines are the same
Expand All @@ -38,34 +38,38 @@ The DIFF format should be used to generate diff changes on large files:
FOLLOW THE SYNTAX PRECISLY. THIS IS IMPORTANT.
DIFF ./file.ts:
\`\`\`diff
[original line number] <2 lines before changes (not the whole file)>
[original line number] line before changes
- [original line number] <deleted line>
+ <added line>
[original line number] <2 lines after changes (not the whole file)>
[original line number] line after changes
\`\`\`

DIFF ./file2.ts:
\`\`\`diff
[original line number] <2 lines before changes (not the whole file)>
[original line number] line before changes
- [original line number] <deleted line>
- [original line number] <delete line 2>
+ <added line>
+ <added line 2>
[original line number] <2 lines after changes (not the whole file)>
[original line number] line after changes
\`\`\`

DIFF ./file3.ts:
\`\`\`diff
[original line number] <2 lines before changes (not the whole file)>
[original line number] line before changes
+ <added line>
[original line number] <2 lines after changes (not the whole file)>
[original line number] line after changes
\`\`\`

DIFF ./file4.ts:
\`\`\`diff
[original line number] <2 lines before changes (not the whole file)>
[original line number] line before changes
- [original line number] <deleted line>
[original line number] <2 lines after changes (not the whole file)>
[original line number] line after changes
\`\`\`

## Choosing what file format to use

- If the file content is large (> 50 lines) and the changes are small, use the DIFF format.
- In all other cases, use the FILE file format.
`
8 changes: 5 additions & 3 deletions packages/core/src/genaisrc/system.files.genai.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ system({
})

const folder = env.vars["outputFolder"] || "."
$`## Files
$`## FULL file format

When generating or updating files you will use the following syntax:`
When generating or updating files you may use the FULL file format:`

def(`File ${folder}/file1.ts`, `What goes in\n${folder}/file1.ts.`, {
language: "typescript",
Expand All @@ -23,7 +23,9 @@ def(`File /path_to_file/file2.md`, `What goes in\n/path_to_file/file2.md.`, {

$`- Make sure to use precisely \`\`\` to guard file code sections.
- Always sure to use precisely \`\`\`\`\` to guard file markdown sections.
- Use full path of filename in code section header.`
- Use full path of filename in code section header.
- do NOT use ## headers with filename
`
if (folder !== ".")
$`When generating new files, place files in folder "${folder}".`
$`- If a file does not have changes, do not regenerate.
Expand Down
Loading
Loading