diff --git a/docs/source/evaluation_concepts.md b/docs/source/evaluation_concepts.md index dac487d..14971ec 100644 --- a/docs/source/evaluation_concepts.md +++ b/docs/source/evaluation_concepts.md @@ -11,7 +11,7 @@ In the `evaluation` directory, there are sample files for running evaluation on Evaluation relies on [Prometheus](https://github.com/kaistAI/Prometheus) as LLM judge. We internally serve it via [vLLM](https://github.com/vllm-project/vllm) but any other OpenAI API compatible service should work (e.g. llamafile via their `api_like_OAI.py` script). -Input datasets _must_ be in HuggingFace format. The code below shows how to convert Prometheus benchmark datasets and optionally save them as wandb artifacts: +Input datasets _must_ be saved as HuggingFace [datasets.Dataset](https://huggingface.co/docs/datasets/v2.19.0/en/package_reference/main_classes#datasets.Dataset). The code below shows how to convert Prometheus benchmark datasets and optionally save them as wandb artifacts: ``` import wandb