Leveraging static analysis for evaluating code-generation models

In recent times, the utilization of Large Language Models (LLMs) for code generation has gained substantial traction. Tools such as ChatGPT, GitHub CoPilot, Code Llama, Bard, and the pioneering work of Rozière et al. with Code Llama aim to streamline developer workflows and expedite development cycles. Despite their promising prospects, code produced by these tools often suffers from bugs, hampering their overall utility. While existing methodologies primarily focus on resource-intensive runtime analysis to address these issues, research exploring static analysis, especially across a limited range of programming languages, remains scarce.

Our study aims to enrich the baseline code generation model by incorporating insights from static error analysis, potentially refining code generation quality. To achieve this objective, we introduce a pipeline that assimilates feedback gleaned from static analysis into the baseline model. Furthermore, we enhance the baseline model by fine-tuning it using samples previously rejected due to static errors. Our empirical observations underscore the efficacy of both strategies in mitigating the occurrence of observed static errors.

Relevant links:

About

This repository contains code base for project titled Leveraging static analysis for evaluating code-generation models developed during the CSCI 544 Applied Natural Language Processing course, Fall 2023, at the University of Southern California (USC).

Pipeline

The pipeline employs automated feedback via linters (static code analyzers) to enhance error detection and improve the underlying code generation models. The multi-stage feedback pipeline is designed for effective code generation refinement.

Pipeline Overview:

Context Generation: The pre-processing stage generates the context as part of a prompt, incorporating text from the dataset.
Code Generation: The model utilizes the provided context along with text from the dataset to generate code.
Linters Integration: Linters are executed on the generated code to identify errors.
Feedback Loop: Detected errors are then fed back to the model, enhancing subsequent code generation.

This systematic approach allows for the identification and minimization of errors within the code generation process, enabling precise insights into areas where the model might exhibit shortcomings. Importantly, it facilitates targeted corrections by providing precise information about error types and their respective locations.

The pipeline is partly automated for evaluation and report generation for proof-of-concept purposes.

Fine-tuning:

Fine-Tuning in this context seeks to elevate the baseline model's initial accuracy without necessitating subsequent feedback adjustments. This process involves refining the baseline model for code generation, specifically leveraging the DPO method. We utilize prompt construction following the same procedure as the initial stage of our feedback pipeline.

In this phase, we're utilizing quantized models for streamlined loading into the system and to facilitate running on TPUs.

Important

We have used GPU-P100 on Kaggle and T4 on Google Colaboratory for our experiments.

We utilized the XLCoST dataset for the code completion task. This parallel dataset comprises solutions for data structures and algorithms problems in six programming languages: C++, Java, Python, PHP, C, and C#. Our experiment primarily focuses on program-level samples in C++ and Python. Our baseline model, CodeLlama-7b-Instruct-hf, was trained and evaluated using this dataset.

Directory Structure

Directory	Description
data	Contains sampled raw and processed XLCoST data for training, evaluation of CodeLlama model
feedback_pipeline	Notebooks for running static analysis on code generated after multiple feedback loops
fine_tuning	Notebooks for fine-tuning CodeLlama models to enhance code generation using enriched prompts
linter_setup_scripts	Bash Scripts for installing, setting up, and supporting linters
preprocessing	Code snippets for pre-processing and parsing in notebooks
reports	Project-related documentation and reports
results	Directory storing results produced at different stages of pipelines
static_analysis_pipeline	Scripts encompassing components of static analysis pipeline for evaluating source scripts before and after feedback loops

Setup and Usage

Creating a Python Virtual Environment

Navigate to the Project Directory:
```
cd static_analysis_codegen_llms
```
Create a Virtual Environment:
```
python -m venv codegenllm
```
Activate the Virtual Environment:
- On Windows:
```
codegenllm/Scripts/activate
```
- On macOS and Linux:
```
source codegenllm/bin/activate
```
Install Project Dependencies:
```
pip install -r requirements.txt
```
Executing this script will install the necessary libraries for code generation and linters for Python code evaluation.

Install Linters for Static Evaluation

In case linters are not installed follow the below instructions.

Flake 8 for Python

cd linter_setup_scripts/flake8_utils
chmod +x install_flake8.sh
bash install_flake8.sh

CPPCheck for C++

cd linter_setup_scripts/cppcheck_utils
chmod +x install_cppcheck.sh
bash install_cppcheck.sh

Authors

Sai Anuroop Kesanapalli | MS in Computer Science | USC
Abhishek Anand | MS in Computer Science | USC
Kayvan Shah | MS in Applied Data Science | USC
Indrani Panchangam | MS in Computer Science | USC
Vishesh Mittal | MS in Computer Science | USC

LICENSE

This project is licensed under the BSD 3-Clause License. See the LICENSE file for details.

Disclaimer

_{The content and code provided in this repository are for educational and demonstrative purposes only. The project may contain experimental features, and the code might not be optimized for production environments. The authors and contributors are not liable for any misuse, damages, or risks associated with the use of this code. Users are advised to review, test, and modify the code to suit their specific use cases and requirements. By using any part of this project, you agree to these terms.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging static analysis for evaluating code-generation models

Relevant links:

About

Pipeline

Directory Structure

Setup and Usage

Creating a Python Virtual Environment

Install Linters for Static Evaluation

Authors

LICENSE

Disclaimer

About

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
data		data
feedback_pipeline		feedback_pipeline
fine_tuning		fine_tuning
linter_setup_scripts		linter_setup_scripts
preprocessing_pipeline		preprocessing_pipeline
reports		reports
results		results
static_analysis_pipeline		static_analysis_pipeline
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

ksanu1998/static_analysis_codegen_llms

Folders and files

Latest commit

History

Repository files navigation

Leveraging static analysis for evaluating code-generation models

Relevant links:

About

Pipeline

Directory Structure

Setup and Usage

Creating a Python Virtual Environment

Install Linters for Static Evaluation

Authors

LICENSE

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages