Skip to content

Commit

Permalink
Document: update reamde file (#57)
Browse files Browse the repository at this point in the history
Co-authored-by: wwxxzz <[email protected]>
  • Loading branch information
moria97 and wwxxzz authored Jun 7, 2024
1 parent a1a8d0a commit 562c600
Showing 1 changed file with 46 additions and 28 deletions.
74 changes: 46 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ pai_rag run [--host HOST] [--port PORT] [--config CONFIG_FILE]

现在你可以使用命令行向服务侧发送API请求,或者直接打开http://localhost:8000

1.
1. 对话

- **Rag Query请求**

Expand All @@ -49,72 +49,90 @@ curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application
- **多轮对话请求**

```bash
curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"一键助眠是什么?"}'
curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"PAI是什么?"}'

# 传入session_id:对话历史会话唯一标识,传入session_id后,将对话历史进行记录,调用大模型将自动携带存储的对话历史。
curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"它有什么好处?", "session_id": "5801d0d9-e030-409c-9072-c810b858f9fa"}'
curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"它有什么优势?", "session_id": "1702ffxxad3xxx6fxxx97daf7c"}'

# 传入chat_history:用户与模型的对话历史,list中的每个元素是形式为{"user":"用户输入","bot":"模型输出"}的一轮对话,多轮对话按时间顺序排列。
curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"儿童可以使用吗?", "chat_history": [{"user":"一键助眠是什么?", "bot":"一键助眠是一种利用体感振动音乐疗法的睡眠促进技术"}]}'
curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"它有哪些功能?", "chat_history": [{"user":"PAI是什么?", "bot":"PAI是阿里云的人工智能平台,它提供一站式的机器学习解决方案。这个平台支持各种机器学习任务,包括有监督学习、无监督学习和增强学习,适用于营销、金融、社交网络等多个场景。"}]}'

# 同时传入session_id和chat_history:会用chat_history对存储的session_id所对应的对话历史进行追加更新
curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"儿童可以使用吗?", "chat_history": [{"user":"一键助眠是什么?", "bot":"一键助眠是一种利用体感振动音乐疗法的睡眠促进技术"}], "session_id": "5801d0d9-e030-409c-9072-c810b858f9fa"}'
curl -X 'POST' http://127.0.0.1:8000/service/query -H "Content-Type: application/json" -d '{"question":"它有什么优势?", "chat_history": [{"user":"PAI是什么?", "bot":"PAI是阿里云的人工智能平台,它提供一站式的机器学习解决方案。这个平台支持各种机器学习任务,包括有监督学习、无监督学习和增强学习,适用于营销、金融、社交网络等多个场景。"}], "session_id": "1702ffxxad3xxx6fxxx97daf7c"}'
```

- **Agent简单对话**
- **Agent及调用Fucntion Tool的简单对话**

```bash
curl -X 'POST' http://127.0.0.1:8000/service/query/agent -H "Content-Type: application/json" -d '{"question":"最近互联网公司有发生什么大新闻吗?"}'
curl -X 'POST' http://127.0.0.1:8000/service/query/agent -H "Content-Type: application/json" -d '{"question":"今年是2024年,10年前是哪一年?"}'
```

2. Retrieval Batch评估
2. 评估

支持三种评估模式:全链路评估、检索效果评估、生成效果评估。

初次调用时会在 localdata/evaluation 下自动生成一个评估数据集(qc_dataset.json, 其中包含了由LLM生成的query、reference_contexts、reference_node_id、reference_answer)。同时评估过程中涉及大量的LLM调用,因此会耗时较久。

- **(1)全链路效果评估(All)**

```bash
curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate/retrieval
curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate
```

初次调用时会在 localdata/data/evaluation 下面生成一个Retrieval评估数据集(qc_dataset_easy_rag_demo_0.1.1.json, 其中包含了question:context pairs)

返回示例:

```json
{
"status": 200,
"eval_resultes": {
"hit_rate": { "0": 0.821917808219178 },
"mrr": { "0": 0.6506849315068494 }
"result": {
"batch_number": 6,
"hit_rate_mean": 1.0,
"mrr_mean": 0.91666667,
"faithfulness_mean": 0.8333334,
"correctness_mean": 4.5833333,
"similarity_mean": 0.88153079
}
}
```

3. Response Batch评估
- **(2)检索效果评估(Retrieval)**

```bash
curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate/response
curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate/retrieval
```

初次调用时会在 localdata/data/evaluation 下面生成一个Response评估数据集(qa_dataset_easy_rag_demo_0.1.1.json,其中包含了question:reference_answer pairs)

返回示例:

```json
{
"status": 200,
"eval_resultes": {
"Faithfulness": 0.5,
"Answer Relevancy": 0.0,
"Guideline Adherence: The response should fully answer the query.:": 0.5,
"Guideline Adherence: The response should avoid being vague or ambiguous.:": 0.5,
"Guideline Adherence: The response should be specific and use statistics or numbers when possible.:": 0.3,
"Correctness": 0.3,
"Semantic Similarity": 0.2
"result": {
"batch_number": 6,
"hit_rate_mean": 1.0,
"mrr_mean": 0.91667
}
}
```

Note: Response Evaluation涉及大量的LLM调用,因此评估过程会耗时较久。
- **(3)生成效果评估(Response**

对每一个query,生成answer平均耗时10s左右,评估7个指标平均耗时20s左右。
```bash
curl -X 'POST' http://127.0.0.1:8000/service/batch_evaluate/response
```

返回示例:

```json
{
"status": 200,
"result": {
"batch_number": 6,
"faithfulness_mean": 0.8333334,
"correctness_mean": 4.58333333,
"similarity_mean": 0.88153079
}
}
```

### 独立脚本文件:不依赖于整体服务的启动,可独立运行

Expand Down

0 comments on commit 562c600

Please sign in to comment.