Releases: Mondego/dafny-synthesis
Artifacts@FSE24-Reviewed
This release presents the artifact of our work “Towards AI-Assisted Synthesis of Verified Dafny Methods.” This release has been reviewed in FSE24 Artifacts Evaluation.
In this work, we conduct the first empirical study of LLMs synthesizing verifiable Dafny methods. Using 178 programming problems from the MBPP dataset, we prompt two contemporary models (GPT-4 and PaLM-2) to synthesize methods in Dafny. We demonstrate that a prompt following the principles of Chain of Thought (CoT) with semantically similar few-shot examples explaining how to decompose a problem step-by-step, can synthesize verified and correct Dafny methods with meaningful specifications for 58% of problems in our test dataset.
The primary purpose of this artifact is to provide a benchmark dataset MBPP-DFY-153, a collection of 153 programming problems with specifications, solutions, and tests in/for Dafny, based on the MBPP (Mostly Basic Python Programming) dataset curated by Google Research. By executing our scripts, researchers should be able to try different prompts and synthesize Dafny methods from natural language problem descriptions. Scripts are also available to run verification and tests for all verified Dafny methods.
This release has been prepared to claim three badges: “Available”, “Reusable” and "Functional". For a thorough evaluation of our artifact, we suggest that reviewers have familiarity with Dafny, and its installation, Bash scripts, and UNIX command-line operations. Artifacts related to our paper are publicly accessible in our GitHub repository at: https://github.com/Mondego/dafny-synthesis/