17 Dec 12:06

Yunnglin

ea0ac5e

v0.8.1 release Latest

Latest

What's Changed

Unify opencompass and vlmeval output dirs by @Yunnglin in #242
Perf add more metrics by @Yunnglin in #245
Perf add trust remote parameter by @Yunnglin in #246
Compat ms-swift<3.0 by @Yunnglin in #249
Fix humaneval for native eval by @Yunnglin in #248

中文版本

统一 opencompass 和 vlmeval 输出目录，作者：@Yunnglin，相关链接：#242
模型压测：增加更多指标，作者：@Yunnglin，相关链接：#245
模型压测：添加trust remote参数，作者：@Yunnglin，相关链接：#246
兼容 ms-swift<3.0，作者：@Yunnglin，相关链接：#249
修复本地评估的 humaneval 问题，作者：@Yunnglin，相关链接：#248

Full Changelog: v0.8.0...v0.8.1

Contributors

Yunnglin

Assets 2

14 Dec 17:30

wangxingjun778

v0.8.0

89a5143

v0.8.0 release

Release Notes

Optimize Native eval and remove template_type #231
The evalscope perf command supports the --outputs-dir configuration. #232
Support ragas 0.2.7 #234

Bug Fixes

Fix longwriter docs #239
Fix lint for longwriter #240
Fix lint #237
Unify perf output #238

Documentation Updates

Fix longwriter docs #239
Optimize Native eval and remove template_type #231

中文说明

特性

取消Native模式评测中template_type参数 #231
perf模块支持--output-dir #232
支持适配最新的ragas 0.2.7版本 #234

缺陷修复

修复longwriter代码示例，优化流程 #239
修复lint，以及longwriter的lint #240 #237

文档更新

更新longwriter文档 #239
更新Native评测模式的相关文档 #231

Assets 2

04 Dec 04:24

Yunnglin

v0.7.2

e8b2d4b

v0.7.2 release

Release Note

Remove pyarrow version requirement #225
Optimize warning info #223

中文说明

移除 pyarrow 版本要求 #225
优化 warning 信息 #223

Assets 2

28 Nov 18:30

wangxingjun778

v0.7.1

54eef61

v0.7.1 release

Release Notes

Add PMMEval benchmark #222

中文说明

特性

增加PMMEval评测集 #222

Assets 2

28 Nov 07:14

wangxingjun778

v0.7.0

2948eb7

v0.7.0 release

Release Notes

Refactor the perf module, more robust and easier to use. #178
Add speed benchmarking in the perf module. #178
Add multi-modal benchmark flickr8k in the perf module for speed benchmark. #211

Bug Fixes

Add timeout for download punkt.zip #206
Fix parallel for speed benchmarking in the perf module. #215

Documentation Updates

Update VLM-Eval doc #209
Update perf module doc #178 #211

中文说明

特性

重构perf模块，更鲁棒、更易用。 #178
在perf模块中添加速度基准测试。 #178
在perf模块中添加多模态基准 flickr8k 以进行速度基准测试。 #211

缺陷修复

修复下载punkt.zip的超时问题。 #206
修复perf模块中的速度基准测试并行问题。 #215

文档更新

更新VLM-Eval文档。 #209
更新perf模块文档。 #178 #211

Assets 2

22 Nov 06:34

wangxingjun778

v0.6.1

5e9c65c

v0.6.1 release

Release Notes

Add CMMLU benchmark #198
Add publish workflow #186
Adapt RAGAS v0.2.5 and update readme #205
Adapt MTEB v1.19 #196

Bug Fixes

Set datasets version: dataset>=3.0.0, <=3.0.1 #184
Set pyarrow version to <=17.0.0 to avoid installation issue on OSX. #187
Add timeout for download punkt.zip #206

Documentation Updates

Update OpenCompass list all datasets docs #199
Update RAGAS v0.2.5 docs #205

中文说明

特性

支持CMMLU benchmark #198
支持publish 流程 #186
适配RAGAS v0.2.5并更新文档 #205
适配 MTEB v1.19 #196

缺陷修复

设置datasets 版本，修复兼容性问题: dataset>=3.0.0, <=3.0.1 #184
设置 pyarrow版本：<=17.0.0 修复在OSX操作系统下的安装问题 #187
增加下载punkt.zip时的超时时间 #206

文档更新

更新OpenCompass作为backend时所支持的数据集列表文档 #199
更新RAGAS v0.2.5 文档 #205

Assets 2

08 Nov 05:51

wangxingjun778

v0.6.0

d289ece

Release v0.6.0

Release Notes

Support multi-modal RAG evaluation #149
- Add CLIP_Benchmark
- Add end-to-end multi-modal RAG evaluation in Ragas
To be compatible with Ragas v0.2.3 #165 #171
Support truncating input for CLIP models #163 #164
Support saving knowledge graphs when generating datasets in Ragas #175

Bug Fixes

Fix issue of abnormal metrics during CMTEB evaluation #157
Fix issue of GenerationConfig being None #173
Update datasets version constraints #184
Add publish workflow #186

Documentation Updates

Update VLMEvalKit documentation #166
Update multi-modal RAG blog #172

中文说明

特性

添加多模态RAG评测支持 #149
- 支持CLIP_Benchmark
- 支持Ragas端到端多模态RAG评测
兼容Ragas v0.2.3 #165 #171
支持CLIP模型截断输入 #163 #164
支持Ragas生成数据集时保存知识图谱 #175

缺陷修复

修复CMTEB评估时指标异常的问题 #157
修复GenerationConfig为None的异常 #173
更新datasets版本限制 #184
增加publish workflow #186

文档更新

更新VLMEvalKit文档 #166
更新多模态RAG博客 #172

Assets 2

15 Oct 02:57

Yunnglin

v0.5.5

40675d6

Release v0.5.5

Release Notes

Added Dataset Support:
- Enhanced multimodal evaluation capabilities, now supporting MMBench-Video, Video-MME, and MVBench video evaluations #146
- Added cmb dataset #117
Support for LongBench-write quality evaluation of long text generation #136
Automatic downloading of punkt_tab.zip from nltk #140
Support for RAG evaluation #127:
- Support for embeddings/reranker evaluation: Integration of MTEB (Massive Text Embedding Benchmark) and CMTEB (Chinese Massive Text Embedding Benchmark), supporting tasks such as retrieval and reranking
- Support for end-to-end RAG evaluation: Integration of the ragas framework, supporting automatic generation of evaluation datasets and evaluation based on judge models
Documentation Updates:
- Added "Blog" section #126, #135
- Added support for dataset page #121
- Updated function usage instructions #125, #134, #138, #137, #127
Updated dependencies: nltk>=3.9 and rouge-score>=0.1.0 #145, #143

中文说明

新增数据集支持：
- 完善多模态评测功能，支持MMBench-Video，Video-MME，MVBench视频评测 #146
- 新增cmb数据集 #117
支持LongBench-write 长文本生成的质量评测 #136
支持从nltk自动下载 punkt_tab.zip #140
支持RAG评测：#127
- 支持embeddings/reranker 评测：集成MTEB（Massive Text Embedding Benchmark）和 CMTEB（Chinese Massive Text Embedding Benchmark），支持检索、重排等任务评估
- 支持RAG端到端评测：集成ragas框架，支持自动生成评测数据集和基于裁判员模型的评测
文档更新
- 增加 “博客” 板块 #126, #135
- 增加支持的数据集页面 #121
- 更新功能使用说明 #125, #134, #138, #137, #127
更新依赖nltk>=3.9和rouge-score>=0.1.0 #145, #143

Assets 2

09 Aug 13:20

wangxingjun778

v0.5.2

138c994

Release v0.5.2

Highlight features

Support Multi-modal models evaluation (VLM Eval)
Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
Support installation with format: pip install evalscope[opencompass] or pip install evalscope[vlmeval]

Breaking Changes

None

What's Changed

Support Multi-modal models evaluation (VLM Eval)
Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
Support installation with format: pip install evalscope[opencompass] or pip install evalscope[vlmeval]
Update README
Add UT cases for VLM eval
Update examples for OpenCompass and VLMEval eval backends
Update version restrictions for ms-opencompass and ms-vlmeval dependencies.

Assets 2

29 Jul 05:46

wangxingjun778

v0.4.3

1056a56

Release v0.4.3

Support async client infer for OpenAI API format evaluation
Support mulati-modal evaluation with VLMEvalKit as a eval-backend
Refactor setup, support pip install llmuses[opencompass], pip install llmuses[vlmeval], pip install llmuses[all]
Fix some bugs

Assets 2

Releases: modelscope/evalscope

v0.8.1 release

What's Changed

中文版本

Contributors

v0.8.0 release

Release Notes

Bug Fixes

Documentation Updates

中文说明

特性

缺陷修复

文档更新

v0.7.2 release

Release Note

中文说明

v0.7.1 release

Release Notes

中文说明

特性

v0.7.0 release

Release Notes

Bug Fixes

Documentation Updates

中文说明

特性

缺陷修复

文档更新

v0.6.1 release

Release Notes

Bug Fixes

Documentation Updates

中文说明

特性

缺陷修复

文档更新

Release v0.6.0

Release Notes

Bug Fixes

Documentation Updates

中文说明

特性

缺陷修复

文档更新

Release v0.5.5

Release Notes

中文说明

Release v0.5.2

Highlight features

Breaking Changes

What's Changed

Release v0.4.3