Skip to content

Release v0.5.5

Compare
Choose a tag to compare
@Yunnglin Yunnglin released this 15 Oct 02:57
· 68 commits to main since this release

Release Notes

  1. Added Dataset Support:

    • Enhanced multimodal evaluation capabilities, now supporting MMBench-Video, Video-MME, and MVBench video evaluations #146
    • Added cmb dataset #117
  2. Support for LongBench-write quality evaluation of long text generation #136

  3. Automatic downloading of punkt_tab.zip from nltk #140

  4. Support for RAG evaluation #127:

    • Support for embeddings/reranker evaluation: Integration of MTEB (Massive Text Embedding Benchmark) and CMTEB (Chinese Massive Text Embedding Benchmark), supporting tasks such as retrieval and reranking
    • Support for end-to-end RAG evaluation: Integration of the ragas framework, supporting automatic generation of evaluation datasets and evaluation based on judge models
  5. Documentation Updates:

  6. Updated dependencies: nltk>=3.9 and rouge-score>=0.1.0 #145, #143

中文说明

  1. 新增数据集支持:

    • 完善多模态评测功能,支持MMBench-Video,Video-MME,MVBench视频评测 #146
    • 新增cmb数据集 #117
  2. 支持LongBench-write 长文本生成的质量评测 #136

  3. 支持从nltk自动下载 punkt_tab.zip #140

  4. 支持RAG评测:#127

    • 支持embeddings/reranker 评测:集成MTEB(Massive Text Embedding Benchmark)和 CMTEB(Chinese Massive Text Embedding Benchmark),支持检索、重排等任务评估
    • 支持RAG端到端评测:集成ragas框架,支持自动生成评测数据集和基于裁判员模型的评测
  5. 文档更新

  6. 更新依赖nltk>=3.9rouge-score>=0.1.0 #145, #143