RepoExec is a novel benchmark designed to evaluate code generation at the repository level with a focus on executability and correctness. This benchmark addresses the gaps in existing systems by emphasizing real-world applicability and providing a comprehensive assessment of code functionality. It aims to provide a comprehensive evaluation of code functionality and alignment with developer intent, paving the way for more reliable and applicable CodeLLMs in real-world scenarios.
RepoExec is available at Huggingface datasets. The instruction-tuning data used in our work is presented in RepoExec-Instruct.
from datasets import load_dataset
# RepoExec contains 3 subsets, corresponding to the detail of the context level
# full_context || medium_context || small_context
dataset = load_dataset("Fsoft-AIC/RepoExec")
Examples:
# full context
import base64
import random
import unicodedata
import zlib
from typing import Union
from uuid import uuid4
from ._regex import *
from .errors import InvalidInputError
from .validation import is_snake_case, is_full_string, is_camel_case, is_integer, is_string
CAMEL_CASE_REPLACE_RE = re.compile(r'([a-z]|[A-Z]+)(?=[A-Z])')
class InvalidInputError(TypeError):
"""
Custom error raised when received object is not a string as expected.
"""
def __init__(self, input_data: Any):
"""
:param input_data: Any received object
"""
type_name = type(input_data).__name__
msg = 'Expected "str", received "{}"'.format(type_name)
super().__init__(msg)
def is_string(obj: Any) -> bool:
"""
Checks if an object is a string.
*Example:*
>>> is_string('foo') # returns true
>>> is_string(b'foo') # returns false
:param obj: Object to test.
:return: True if string, false otherwise.
"""
return isinstance(obj, str)
def is_camel_case(input_string: Any) -> bool:
"""
Checks if a string is formatted as camel case.
A string is considered camel case when:
- it's composed only by letters ([a-zA-Z]) and optionally numbers ([0-9])
- it contains both lowercase and uppercase letters
- it does not start with a number
*Examples:*
>>> is_camel_case('MyString') # returns true
>>> is_camel_case('mystring') # returns false
:param input_string: String to test.
:type input_string: str
:return: True for a camel case string, false otherwise.
"""
return is_full_string(input_string) and CAMEL_CASE_TEST_RE.match(input_string) is not None
def camel_case_to_snake(input_string, separator='_'):
"""
Convert a camel case string into a snake case one.
(The original string is returned if is not a valid camel case string)
*Example:*
>>> camel_case_to_snake('ThisIsACamelStringTest') # returns 'this_is_a_camel_case_string_test'
:param input_string: String to convert.
:type input_string: str
:param separator: Sign to use as separator.
:type separator: str
:return: Converted string.
"""
# medium context
import base64
import random
import unicodedata
import zlib
from typing import Union
from uuid import uuid4
from ._regex import *
from .errors import InvalidInputError
from .validation import is_snake_case, is_full_string, is_camel_case, is_integer, is_string
CAMEL_CASE_REPLACE_RE = re.compile(r'([a-z]|[A-Z]+)(?=[A-Z])')
class InvalidInputError(TypeError):
"""
Custom error raised when received object is not a string as expected.
"""
def __init__(self, input_data: Any):
"""
:param input_data: Any received object
"""
def is_string(obj: Any) -> bool:
"""
Checks if an object is a string.
*Example:*
>>> is_string('foo') # returns true
>>> is_string(b'foo') # returns false
:param obj: Object to test.
:return: True if string, false otherwise.
"""
def is_camel_case(input_string: Any) -> bool:
"""
Checks if a string is formatted as camel case.
A string is considered camel case when:
- it's composed only by letters ([a-zA-Z]) and optionally numbers ([0-9])
- it contains both lowercase and uppercase letters
- it does not start with a number
*Examples:*
>>> is_camel_case('MyString') # returns true
>>> is_camel_case('mystring') # returns false
:param input_string: String to test.
:type input_string: str
:return: True for a camel case string, false otherwise.
"""
def camel_case_to_snake(input_string, separator='_'):
"""
Convert a camel case string into a snake case one.
(The original string is returned if is not a valid camel case string)
*Example:*
>>> camel_case_to_snake('ThisIsACamelStringTest') # returns 'this_is_a_camel_case_string_test'
:param input_string: String to convert.
:type input_string: str
:param separator: Sign to use as separator.
:type separator: str
:return: Converted string.
"""
# small context
import base64
import random
import unicodedata
import zlib
from typing import Union
from uuid import uuid4
from ._regex import *
from .errors import InvalidInputError
from .validation import is_snake_case, is_full_string, is_camel_case, is_integer, is_string
CAMEL_CASE_REPLACE_RE = re.compile(r'([a-z]|[A-Z]+)(?=[A-Z])')
class InvalidInputError(TypeError):
def __init__(self, input_data: Any):
def is_string(obj: Any) -> bool:
def is_camel_case(input_string: Any) -> bool:
def camel_case_to_snake(input_string, separator='_'):
"""
Convert a camel case string into a snake case one.
(The original string is returned if is not a valid camel case string)
*Example:*
>>> camel_case_to_snake('ThisIsACamelStringTest') # returns 'this_is_a_camel_case_string_test'
:param input_string: String to convert.
:type input_string: str
:param separator: Sign to use as separator.
:type separator: str
:return: Converted string.
"""
git clone https://github.com/FSoft-AI4Code/RepoExec.git
cd RepoExec
unzip test-apps.zip
pip install -r requirement
cd RepoExec/bigcode-eval-repoexec
pip install -e .
cd RepoExec/execution-code-eval
(sudo) docker build -t codeeval-runner -f Dockerfile --platform linux/amd64 .
Script examples to run evaluation are contained in scripts
cd RepoExec/bigcode-eval-repoexec
pip install -e .
Example scripts are in phi-2-generation
There are 2 kinds of prompts: BasePrompt and InstructPrompt:
- To use BasePrompt, specify the
--tasks
argument torepoexec-{full|medium|small}-context
. - To use InstructPrompt, specify the
--tasks
argument toinstruct-repoexec-{full|medium|small}-context
andprompt
argument to use the template specific for each model (e.g.--prompt codellama
for CodeLlama series).
After running the generation script, generation result will be a nested list of prediction for each problem in the dataset and is saved to a generations.json
file. See the example in phi-2 prediction
Example:
[[pred_11, pred_12, pred13], [pred_21, pred_22, pred_23], ...]
Note: if you use a close-source model (e.g ChatGPT), please use your custom script. This currently supports only an open-source model.
Process to acquire the target function from prediction and save to json file.
python3 process_result.py \
--subset medium_context \
--prediction_dir ../results/examples/predictions/repoexec-full-context/BasePrompt-phi-2 \
Execute the generated function of the model to obtain the execution output.
python3 execute.py --subset full_context \
--prediction_dir ../results/examples/predictions/repoexec-full-context/BasePrompt-phi-2 \
--execution_dir ../results/examples/execution_rs/repoexec-full-context/BasePrompt-phi-2 \
python3 passk.py --execution_dir ../results/examples/execution_rs/repoexec-full-context/BasePrompt-phi-2
python3 get_dir.py --execution_dir ../results/examples/execution_rs/repoexec-full-context/BasePrompt-phi-2
Please see this repo for tool usage.
More details can be found in our paper.
If you're using RepoExec, please cite using this BibTeX:
@article{nam2024repoexec,
title={RepoExec: Evaluate Code Generation with a Repository-Level Executable Benchmark},
author={Hai, Nam Le and Manh, Dung Nguyen and Bui, Nghi DQ},
journal={arXiv preprint arXiv:2406.11927v1},
year={2024}
}
This codebase is adapted from:
If you have any questions, comments or suggestions, please do not hesitate to contact us.
- Website: fpt-aicenter
- Email: [email protected]