Document: update reamde file (#57)

Co-authored-by: wwxxzz <[email protected]>
aigc-apps · Jun 7, 2024 · 562c600 · 562c600
1 parent a1a8d0a
commit 562c600
Showing 1 changed file with 46 additions and 28 deletions.
diff --git a/README.md b/README.md
@@ -38,7 +38,7 @@ pai_rag run [--host HOST] [--port PORT] [--config CONFIG_FILE]
 
 现在你可以使用命令行向服务侧发送API请求，或者直接打开http://localhost:8000
 
-1.
+1. 对话
 
 - **Rag Query请求**
 
@@ -49,72 +49,90 @@ curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application
 - **多轮对话请求**
 
 ```bash
-curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"一键助眠是什么？"}'
+curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"PAI是什么？"}'
 
 # 传入session_id：对话历史会话唯一标识，传入session_id后，将对话历史进行记录，调用大模型将自动携带存储的对话历史。
-curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"它有什么好处？", "session_id": "5801d0d9-e030-409c-9072-c810b858f9fa"}'
+curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"它有什么优势？", "session_id": "1702ffxxad3xxx6fxxx97daf7c"}'
 
 # 传入chat_history：用户与模型的对话历史，list中的每个元素是形式为{"user":"用户输入","bot":"模型输出"}的一轮对话，多轮对话按时间顺序排列。
-curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"儿童可以使用吗？", "chat_history": [{"user":"一键助眠是什么？", "bot":"一键助眠是一种利用体感振动音乐疗法的睡眠促进技术"}]}'
+curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"它有哪些功能？", "chat_history": [{"user":"PAI是什么？", "bot":"PAI是阿里云的人工智能平台，它提供一站式的机器学习解决方案。这个平台支持各种机器学习任务，包括有监督学习、无监督学习和增强学习，适用于营销、金融、社交网络等多个场景。"}]}'
 
 # 同时传入session_id和chat_history：会用chat_history对存储的session_id所对应的对话历史进行追加更新
-curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"儿童可以使用吗？", "chat_history": [{"user":"一键助眠是什么？", "bot":"一键助眠是一种利用体感振动音乐疗法的睡眠促进技术"}], "session_id": "5801d0d9-e030-409c-9072-c810b858f9fa"}'
+curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"它有什么优势？", "chat_history": [{"user":"PAI是什么？", "bot":"PAI是阿里云的人工智能平台，它提供一站式的机器学习解决方案。这个平台支持各种机器学习任务，包括有监督学习、无监督学习和增强学习，适用于营销、金融、社交网络等多个场景。"}], "session_id": "1702ffxxad3xxx6fxxx97daf7c"}'
 ```
 
-- **Agent简单对话**
+- **Agent及调用Fucntion Tool的简单对话**
 
 ```bash
-curl -X 'POST' http://127.0.0.1:8000/service/query/agent -H "Content-Type: application/json" -d '{"question":"最近互联网公司有发生什么大新闻吗？"}'
+curl -X 'POST' http://127.0.0.1:8000/service/query/agent -H "Content-Type: application/json" -d '{"question":"今年是2024年，10年前是哪一年？"}'
 ```
 
-2. Retrieval Batch评估
+2. 评估
+
+支持三种评估模式：全链路评估、检索效果评估、生成效果评估。
+
+初次调用时会在 localdata/evaluation 下自动生成一个评估数据集（qc_dataset.json， 其中包含了由LLM生成的query、reference_contexts、reference_node_id、reference_answer）。同时评估过程中涉及大量的LLM调用，因此会耗时较久。
+
+- **（1）全链路效果评估（All）**
 
 ```bash
-curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate/retrieval
+curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate
 ```
 
-初次调用时会在 localdata/data/evaluation 下面生成一个Retrieval评估数据集（qc_dataset_easy_rag_demo_0.1.1.json， 其中包含了question:context pairs）
-
 返回示例：
 
 ```json
 {
   "status": 200,
-  "eval_resultes": {
-    "hit_rate": { "0": 0.821917808219178 },
-    "mrr": { "0": 0.6506849315068494 }
+  "result": {
+    "batch_number": 6,
+    "hit_rate_mean": 1.0,
+    "mrr_mean": 0.91666667,
+    "faithfulness_mean": 0.8333334,
+    "correctness_mean": 4.5833333,
+    "similarity_mean": 0.88153079
   }
 }
 ```
 
-3. Response Batch评估
+- **（2）检索效果评估（Retrieval）**
 
 ```bash
-curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate/response
+curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate/retrieval
 ```
 
-初次调用时会在 localdata/data/evaluation 下面生成一个Response评估数据集（qa_dataset_easy_rag_demo_0.1.1.json，其中包含了question:reference_answer pairs）
-
 返回示例：
 
 ```json
 {
   "status": 200,
-  "eval_resultes": {
-    "Faithfulness": 0.5,
-    "Answer Relevancy": 0.0,
-    "Guideline Adherence: The response should fully answer the query.:": 0.5,
-    "Guideline Adherence: The response should avoid being vague or ambiguous.:": 0.5,
-    "Guideline Adherence: The response should be specific and use statistics or numbers when possible.:": 0.3,
-    "Correctness": 0.3,
-    "Semantic Similarity": 0.2
+  "result": {
+    "batch_number": 6,
+    "hit_rate_mean": 1.0,
+    "mrr_mean": 0.91667
   }
 }
 ```
 
-Note: Response Evaluation涉及大量的LLM调用，因此评估过程会耗时较久。
+- **（3）生成效果评估（Response）**
 
-对每一个query，生成answer平均耗时10s左右，评估7个指标平均耗时20s左右。
+```bash
+curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate/response
+```
+
+返回示例：
+
+```json
+{
+  "status": 200,
+  "result": {
+    "batch_number": 6,
+    "faithfulness_mean": 0.8333334,
+    "correctness_mean": 4.58333333,
+    "similarity_mean": 0.88153079
+  }
+}
+```
 
 ### 独立脚本文件：不依赖于整体服务的启动，可独立运行