-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add opencompass cmd parser * fix dry-run * update * refactor for backend & add backend module * add example for oc * update * update readme and example * temp * update opencompass parser * add opencompas cli and chat_medium collection * update cli and eval_api * add OpenCompassArguments for cli * update eval_api for models * update version * add check end for ms-opencompass * add backend args merge and datasets filter * add models and api meta template * assert models * add tmpfile for task config * fix datasets in self.args.config * update * pop dataset_name key in datasets * set meta_template to None for mmlu, ceval, ... * fix models path * update test task * update eval task * update test tasks * add debug info * add example for swift eval * add download data * update * update import for eval_datasets * add json config for toolbench eval * add entry task in run.py; update example_eval_swift_openai_api * add yaml and json config * update example * update gsm8k * update example * update summarizer * support opencompass backend in Summaryzer * update summarizer * update exmaple * update example; set dataset_dir in config.py * update version * update args name * update example * set mmlu to 0-shot for swift * add api key for openai api * update eval datasets * add limit * add limit example * update datasets * update readme for oc backend * update example * update readme for oc backend * add readme for en * fix eval_config assertion * fix eval_backend assertion * fix eval_backend assertion * fix eval_backend assertion * add ut for swift-eval * add test run all * update tests * add logger for ut * update * update pypi source * add debug * update * fix swift deploy subprocess * update * add check service * update * update * fix check swift server * update terminate process * update * update example * update example * update example * fix eval_backend checking * update version * update UTs and examples * refactor setup * add eval_backend and eval_config in config.py * fix pr issues
- Loading branch information
1 parent
d15017a
commit 1056a56
Showing
19 changed files
with
218 additions
and
39 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Copyright (c) Alibaba, Inc. and its affiliates. | ||
|
||
__version__ = '0.4.0' | ||
__release_datetime__ = '2024-06-27 08:00:00' | ||
__version__ = '0.4.3' | ||
__release_datetime__ = '2024-07-28 08:00:00' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
requirements/framework.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,5 +22,5 @@ seaborn | |
simple-ddl-parser | ||
streamlit | ||
tqdm | ||
transformers | ||
transformers>=4.33,<4.43 | ||
transformers_stream_generator |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
ms-opencompass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
ms-vlmeval |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Copyright (c) Alibaba, Inc. and its affiliates. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
# Copyright (c) Alibaba, Inc. and its affiliates. | ||
|
||
import os | ||
import json | ||
import time | ||
import requests | ||
import subprocess | ||
import unittest | ||
|
||
from llmuses.backend.opencompass import OpenCompassBackendManager | ||
from llmuses.run import run_task | ||
from llmuses.summarizer import Summarizer | ||
from llmuses.utils import test_level_list, is_module_installed | ||
|
||
from llmuses.utils.logger import get_logger | ||
|
||
logger = get_logger(__name__) | ||
|
||
DEFAULT_CHAT_MODEL_URL = 'http://127.0.0.1:8000/v1/chat/completions' | ||
DEFAULT_BASE_MODEL_URL = 'http://127.0.0.1:8001/v1/completions' | ||
|
||
|
||
class TestRunSwiftEval(unittest.TestCase): | ||
|
||
def setUp(self) -> None: | ||
logger.info(f'Init env for swift-eval UTs ...\n') | ||
|
||
self.model_name = 'llama3-8b-instruct' | ||
assert is_module_installed('llmuses'), 'Please install `llmuses` from pypi or source code.' | ||
|
||
logger.warning('Note: installing ms-opencompass ...') | ||
subprocess.run('pip3 install ms-opencompass -U', shell=True, check=True) | ||
|
||
logger.warning('Note: installing ms-swift ...') | ||
subprocess.run('pip3 install ms-swift -U', shell=True, check=True) | ||
|
||
logger.warning('vllm not installed, use native swift deploy service instead.') | ||
|
||
logger.info(f'\nStaring run swift deploy ...') | ||
self.process_swift_deploy = subprocess.Popen(f'swift deploy --model_type {self.model_name}', | ||
text=True, shell=True, | ||
stdout=subprocess.PIPE, stderr=subprocess.PIPE) | ||
|
||
self.all_datasets = OpenCompassBackendManager.list_datasets() | ||
assert len(self.all_datasets) > 0, f'Failed to list datasets from OpenCompass backend: {self.all_datasets}' | ||
|
||
def tearDown(self) -> None: | ||
# Stop the swift deploy model service | ||
logger.warning(f'\nStopping swift deploy ...') | ||
self.process_swift_deploy.terminate() | ||
self.process_swift_deploy.wait() | ||
logger.info(f'Process swift-deploy terminated successfully.') | ||
|
||
@staticmethod | ||
def find_and_kill_pid(pids: list): | ||
if len(pids) > 0: | ||
for pid in pids: | ||
subprocess.run(["kill", str(pid)]) | ||
logger.warning(f"Killed process {pid}.") | ||
else: | ||
logger.info(f"No pids found.") | ||
|
||
@staticmethod | ||
def find_and_kill_service(service_name): | ||
try: | ||
# find pid | ||
result = subprocess.run( | ||
["ps", "-ef"], stdout=subprocess.PIPE, text=True | ||
) | ||
|
||
lines = result.stdout.splitlines() | ||
pids = [] | ||
for line in lines: | ||
if service_name in line and "grep" not in line: | ||
parts = line.split() | ||
pid = parts[1] | ||
pids.append(pid) | ||
|
||
if not pids: | ||
logger.info(f"No process found for {service_name}.") | ||
else: | ||
for pid in pids: | ||
subprocess.run(["kill", pid]) | ||
logger.warning(f"Killed process {pid} for service {service_name}.") | ||
except Exception as e: | ||
logger.error(f"An error occurred: {e}") | ||
|
||
@staticmethod | ||
def check_service_status(url: str, data: dict, retries: int = 20, delay: int = 10): | ||
for i in range(retries): | ||
try: | ||
logger.info(f"Attempt {i + 1}: Checking service at {url} ...") | ||
response = requests.post(url, | ||
data=json.dumps(data), | ||
headers={'Content-Type': 'application/json'}, | ||
timeout=30) | ||
if response.status_code == 200: | ||
logger.info(f"Service at {url} is available !\n\n") | ||
return True | ||
else: | ||
logger.info(f"Service at {url} returned status code {response.status_code}.") | ||
except requests.exceptions.RequestException as e: | ||
logger.info(f"Attempt {i + 1}: An error occurred: {e}") | ||
|
||
time.sleep(delay) | ||
|
||
logger.info(f"Service at {url} is not available after {retries} retries.") | ||
return False | ||
|
||
@unittest.skipUnless(1 in test_level_list(), 'skip test in current test level') | ||
def test_run_task(self): | ||
# Prepare the config | ||
task_cfg = dict( | ||
eval_backend='OpenCompass', | ||
eval_config={'datasets': ['mmlu', 'ceval', 'ARC_c', 'gsm8k'], | ||
'models': [ | ||
{'path': 'llama3-8b-instruct', | ||
'openai_api_base': DEFAULT_CHAT_MODEL_URL, | ||
'batch_size': 8}, | ||
], | ||
'work_dir': 'outputs/llama3_eval_result', | ||
'reuse': None, # string, `latest` or timestamp, e.g. `20230516_144254`, default to None | ||
'limit': '[2:5]', # string or int or float, e.g. `[2:5]`, 5, 5.0, default to None, it means run all examples | ||
}, | ||
) | ||
|
||
# Check the service status | ||
data = {'model': self.model_name, 'messages': [{'role': 'user', 'content': 'who are you?'}]} | ||
assert self.check_service_status(DEFAULT_CHAT_MODEL_URL, data=data), f'Failed to check service status: {DEFAULT_CHAT_MODEL_URL}' | ||
|
||
# Submit the task | ||
logger.info(f'Start to run UT with cfg: {task_cfg}') | ||
run_task(task_cfg=task_cfg) | ||
|
||
# Get the final report with summarizer | ||
report_list = Summarizer.get_report_from_cfg(task_cfg) | ||
logger.info(f'>>The report list:\n{report_list}') | ||
|
||
assert len(report_list) > 0, f'Failed to get report list: {report_list}' | ||
|
||
|
||
if __name__ == '__main__': | ||
unittest.main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Copyright (c) Alibaba, Inc. and its affiliates. | ||
|
||
import subprocess | ||
|
||
if __name__ == '__main__': | ||
cmd = f'TEST_LEVEL_LIST=0,1 python3 -m unittest discover .' | ||
run_res = subprocess.run(cmd, text=True, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) | ||
|
||
if run_res.returncode == 0: | ||
print(f'>>test_run_all stdout: {run_res.stdout}') | ||
else: | ||
print(f'>>test_run_all stderr: {run_res.stderr}') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Copyright (c) Alibaba, Inc. and its affiliates. |
Oops, something went wrong.