In recent times, the utilization of Large Language Models (LLMs) for code generation has gained substantial traction. Tools such as ChatGPT, GitHub CoPilot, Code Llama, Bard, and the pioneering work of Rozière et al. with Code Llama aim to streamline developer workflows and expedite development cycles. Despite their promising prospects, code produced by these tools often suffers from bugs, hampering their overall utility. While existing methodologies primarily focus on resource-intensive runtime analysis to address these issues, research exploring static analysis, especially across a limited range of programming languages, remains scarce.
Our study aims to enrich the baseline code generation model by incorporating insights from static error analysis, potentially refining code generation quality. To achieve this objective, we introduce a pipeline that assimilates feedback gleaned from static analysis into the baseline model. Furthermore, we enhance the baseline model by fine-tuning it using samples previously rejected due to static errors. Our empirical observations underscore the efficacy of both strategies in mitigating the occurrence of observed static errors.
This repository contains code base for project titled Leveraging static analysis for evaluating code-generation models
developed during the CSCI 544 Applied Natural Language Processing
course, Fall 2023, at the University of Southern California
(USC).
The pipeline employs automated feedback via linters (static code analyzers) to enhance error detection and improve the underlying code generation models. The multi-stage feedback pipeline is designed for effective code generation refinement.
Pipeline Overview:
- Context Generation: The pre-processing stage generates the context as part of a prompt, incorporating text from the dataset.
- Code Generation: The model utilizes the provided context along with text from the dataset to generate code.
- Linters Integration: Linters are executed on the generated code to identify errors.
- Feedback Loop: Detected errors are then fed back to the model, enhancing subsequent code generation.
This systematic approach allows for the identification and minimization of errors within the code generation process, enabling precise insights into areas where the model might exhibit shortcomings. Importantly, it facilitates targeted corrections by providing precise information about error types and their respective locations.
The pipeline is partly automated for evaluation and report generation for proof-of-concept purposes.
Fine-tuning:
Fine-Tuning in this context seeks to elevate the baseline model's initial accuracy without necessitating subsequent feedback adjustments. This process involves refining the baseline model for code generation, specifically leveraging the DPO method. We utilize prompt construction following the same procedure as the initial stage of our feedback pipeline.
In this phase, we're utilizing quantized models for streamlined loading into the system and to facilitate running on TPUs.
Important
We have used GPU-P100 on Kaggle and T4 on Google Colaboratory for our experiments.
We utilized the XLCoST dataset for the code completion task. This parallel dataset comprises solutions for data structures and algorithms problems in six programming languages: C++, Java, Python, PHP, C, and C#. Our experiment primarily focuses on program-level samples in C++ and Python. Our baseline model, CodeLlama-7b-Instruct-hf
, was trained and evaluated using this dataset.
Directory | Description |
---|---|
data | Contains sampled raw and processed XLCoST data for training, evaluation of CodeLlama model |
feedback_pipeline | Notebooks for running static analysis on code generated after multiple feedback loops |
fine_tuning | Notebooks for fine-tuning CodeLlama models to enhance code generation using enriched prompts |
linter_setup_scripts | Bash Scripts for installing, setting up, and supporting linters |
preprocessing | Code snippets for pre-processing and parsing in notebooks |
reports | Project-related documentation and reports |
results | Directory storing results produced at different stages of pipelines |
static_analysis_pipeline | Scripts encompassing components of static analysis pipeline for evaluating source scripts before and after feedback loops |
-
Navigate to the Project Directory:
cd static_analysis_codegen_llms
-
Create a Virtual Environment:
python -m venv codegenllm
-
Activate the Virtual Environment:
- On Windows:
codegenllm/Scripts/activate
- On macOS and Linux:
source codegenllm/bin/activate
- On Windows:
-
Install Project Dependencies:
pip install -r requirements.txt
Executing this script will install the necessary libraries for code generation and linters for Python code evaluation.
In case linters are not installed follow the below instructions.
Flake 8
for Pythoncd linter_setup_scripts/flake8_utils chmod +x install_flake8.sh bash install_flake8.sh
CPPCheck
for C++cd linter_setup_scripts/cppcheck_utils chmod +x install_cppcheck.sh bash install_cppcheck.sh
- Sai Anuroop Kesanapalli |
MS in Computer Science
|USC
- Abhishek Anand |
MS in Computer Science
|USC
- Kayvan Shah |
MS in Applied Data Science
|USC
- Indrani Panchangam |
MS in Computer Science
|USC
- Vishesh Mittal |
MS in Computer Science
|USC
This project is licensed under the BSD 3-Clause
License. See the LICENSE file for details.