pip install repoqa
repoqa.search_needle_function --model "gpt4-turbo" --backend openai
repoqa.search_needle_function --model "claude-3-haiku-20240307" --backend anthropic
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code --attn-implementation "flash_attention_2"
repoqa.search_needle_function --model "gemini-1.5-pro-latest" --backend google

Resources

PyPI: https://pypi.org/project/repoqa/0.1.1/
Homepage: https://evalplus.github.io/repoqa.html
Dataset release: https://github.com/evalplus/repoqa_release

Contributors

ganler, JialeTomTian, and zyzzzz-123

Assets 2

26 Apr 09:58

ganler

v0.1.0

b873b72

RepoQA v0.1.0

RepoQA for Long-Context Code Understanding

Introduction

RepoQA is a benchmark that aims to exercise LLM's long-context code understanding ability.

Multi-Lingual: RepoQA now supports repositories from 5 programming languages:
- Python
- C++
- TypeScript
- Rust
- Java
Application-driven: RepoQA aims to evaluate LLMs on long-context tasks that can reflect real-life uses. Before RepoQA, long-context evaluators mainly focus on using synthetic tasks to examine the vulnerable parts of the LLM's long context, such as "Needle in the Code" by CodeQwen and "Needle in a Haystack".
The first RepoQA task we propose is 🔍 Searching Needle Function:
- 500 sub-tasks = 5 PLs x 10 repos x 10 needles
- Asks the model to search the corresponding function (we call it needle function) given a precise natural language description

RepoQA is easy to use

Supports following backends
- OpenAI
- Anthropic
- vLLM
- HuggingFace transformers
- Google Generative AI API (Gemini)
🚀 Evaluation can be done in one command
🏆 A leaderboard: https://evalplus.github.io/repoqa.html

Quick examples

pip install repoqa
repoqa.search_needle_function --model "gpt4-turbo" --backend openai
repoqa.search_needle_function --model "claude-3-haiku-20240307" --backend anthropic
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code
repoqa.search_needle_function --model "gemini-1.5-pro-latest" --backend google

Resources

PyPI: https://pypi.org/project/repoqa/0.1.0/
Homepage: https://evalplus.github.io/repoqa.html
Dataset release: https://github.com/evalplus/repoqa_release

Assets 2

24 Apr 06:54

ganler

v0.1.0rc1

a9a6c30

RepoQA v0.1.0 Release Candidate 1 Pre-release

Pre-release

v0.1.0rc1

refactor: clean files for release

Assets 2

21 Apr 02:04

ganler

dev-dataset

a60032b

RepoQA Search-Needle-Function Dataset 2024-04-20 Pre-release

Pre-release

dev-dataset

refactor: optimize dataset name

Assets 4

20 Apr 03:06

ganler

dev-results

8b3d72c

Evaluated Results Pre-release

Pre-release

See attachment; some results might be incomplete.

Assets 6

11 Apr 00:24

ganler

dependency

a9f1fa1

Release of dependency and base dataset Pre-release

Pre-release

We use this release to upload dependency files of different languages produced by https://github.com/evalplus/repoqa/tree/main/scripts/curate/dep_analysis

Assets 9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notable updates

Resources

Notable updates

Quick examples

Resources

Contributors

RepoQA for Long-Context Code Understanding

Introduction

RepoQA is easy to use

Quick examples

Resources

Releases: evalplus/repoqa

RepoQA v0.1.2

Notable updates

Resources

RepoQA v0.1.1

Notable updates

Quick examples

Resources

Contributors

RepoQA v0.1.0

RepoQA for Long-Context Code Understanding

Introduction

RepoQA is easy to use

Quick examples

Resources

RepoQA v0.1.0 Release Candidate 1

RepoQA Search-Needle-Function Dataset 2024-04-20

Evaluated Results

Release of dependency and base dataset