Releases: modelscope/evalscope
Releases · modelscope/evalscope
v0.8.1 release
What's Changed
- Unify
opencompass
andvlmeval
output dirs by @Yunnglin in #242 - Perf add more metrics by @Yunnglin in #245
- Perf add
trust remote
parameter by @Yunnglin in #246 - Compat ms-swift<3.0 by @Yunnglin in #249
- Fix humaneval for native eval by @Yunnglin in #248
中文版本
- 统一
opencompass
和vlmeval
输出目录,作者:@Yunnglin,相关链接:#242 - 模型压测:增加更多指标,作者:@Yunnglin,相关链接:#245
- 模型压测:添加
trust remote
参数,作者:@Yunnglin,相关链接:#246 - 兼容 ms-swift<3.0,作者:@Yunnglin,相关链接:#249
- 修复本地评估的 humaneval 问题,作者:@Yunnglin,相关链接:#248
Full Changelog: v0.8.0...v0.8.1
v0.8.0 release
v0.7.2 release
v0.7.1 release
v0.7.0 release
Release Notes
- Refactor the
perf
module, more robust and easier to use. #178 - Add speed benchmarking in the
perf
module. #178 - Add multi-modal benchmark
flickr8k
in theperf
module for speed benchmark. #211
Bug Fixes
- Add timeout for download punkt.zip #206
- Fix parallel for speed benchmarking in the
perf
module. #215
Documentation Updates
中文说明
特性
缺陷修复
文档更新
v0.6.1 release
Release Notes
- Add CMMLU benchmark #198
- Add publish workflow #186
- Adapt RAGAS v0.2.5 and update readme #205
- Adapt MTEB v1.19 #196
Bug Fixes
- Set datasets version: dataset>=3.0.0, <=3.0.1 #184
- Set pyarrow version to <=17.0.0 to avoid installation issue on OSX. #187
- Add timeout for download punkt.zip #206
Documentation Updates
中文说明
特性
缺陷修复
- 设置datasets 版本,修复兼容性问题: dataset>=3.0.0, <=3.0.1 #184
- 设置 pyarrow版本:<=17.0.0 修复在OSX操作系统下的安装问题 #187
- 增加下载punkt.zip时的超时时间 #206
文档更新
Release v0.6.0
Release Notes
- Support multi-modal RAG evaluation #149
- Add CLIP_Benchmark
- Add end-to-end multi-modal RAG evaluation in Ragas
- To be compatible with Ragas v0.2.3 #165 #171
- Support truncating input for CLIP models #163 #164
- Support saving knowledge graphs when generating datasets in Ragas #175
Bug Fixes
- Fix issue of abnormal metrics during CMTEB evaluation #157
- Fix issue of GenerationConfig being None #173
- Update datasets version constraints #184
- Add publish workflow #186
Documentation Updates
中文说明
特性
- 添加多模态RAG评测支持 #149
- 支持CLIP_Benchmark
- 支持Ragas端到端多模态RAG评测
- 兼容Ragas v0.2.3 #165 #171
- 支持CLIP模型截断输入 #163 #164
- 支持Ragas生成数据集时保存知识图谱 #175
缺陷修复
文档更新
Release v0.5.5
Release Notes
-
Added Dataset Support:
-
Support for
LongBench-write
quality evaluation of long text generation #136 -
Automatic downloading of
punkt_tab.zip
fromnltk
#140 -
Support for RAG evaluation #127:
- Support for embeddings/reranker evaluation: Integration of
MTEB
(Massive Text Embedding Benchmark) andCMTEB
(Chinese Massive Text Embedding Benchmark), supporting tasks such as retrieval and reranking - Support for end-to-end RAG evaluation: Integration of the
ragas
framework, supporting automatic generation of evaluation datasets and evaluation based on judge models
- Support for embeddings/reranker evaluation: Integration of
-
Documentation Updates:
-
Updated dependencies:
nltk>=3.9
androuge-score>=0.1.0
#145, #143
中文说明
Release v0.5.2
Highlight features
- Support Multi-modal models evaluation (VLM Eval)
- Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
- Support installation with format:
pip install evalscope[opencompass]
orpip install evalscope[vlmeval]
Breaking Changes
None
What's Changed
- Support Multi-modal models evaluation (VLM Eval)
- Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
- Support installation with format:
pip install evalscope[opencompass]
orpip install evalscope[vlmeval]
- Update README
- Add UT cases for VLM eval
- Update examples for
OpenCompass
andVLMEval
eval backends - Update version restrictions for ms-opencompass and ms-vlmeval dependencies.
Release v0.4.3
- Support async client infer for OpenAI API format evaluation
- Support mulati-modal evaluation with VLMEvalKit as a eval-backend
- Refactor setup, support pip install llmuses[opencompass], pip install llmuses[vlmeval], pip install llmuses[all]
- Fix some bugs