-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support auto evaluation for multi-modal (#258)
* Support MM: text and image eval * Support MultiModal Eval
- Loading branch information
Showing
16 changed files
with
653 additions
and
181 deletions.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
EasyRec是一个易于使用的推荐框架¶ | ||
EasyRec 实现了常见推荐任务中使用的最先进的机器学习模型:候选生成(匹配)、评分(排名)和多任务学习。 它通过简单的配置和超参数调整(HPO)提高了生成高性能模型的效率。 | ||
|
||
EasyRec视频介绍 | ||
为什么选择 EasyRec?¶ | ||
到处运行¶ | ||
MaxCompute / 数据科学 / DLC / 本地 | ||
TF1.12-1.15 / TF2.x / PAI-TF | ||
多样的输入数据¶ | ||
MaxCompute表 | ||
HDFS 文件 | ||
操作系统文件 | ||
卡夫卡流 | ||
本地 CSV | ||
|
||
配置简单¶ | ||
灵活的功能配置和简单的模型配置 | ||
高效、鲁棒的特征生成[淘宝使用] | ||
漂亮的网络界面正在开发中 | ||
|
||
它很聪明¶ | ||
EarlyStop / 最佳检查站保护程序 | ||
超参数搜索/AutoFeatureCross | ||
开发中:NAS、知识蒸馏、多模式 | ||
|
||
规模大、部署方便¶ | ||
支持大规模嵌入,增量保存 | ||
许多并行策略:ParameterServer、Mirrored、MultiWorker | ||
轻松部署到 EAS:自动扩展、轻松监控 | ||
一致性保证:训练和服务 | ||
|
||
多种模型 | ||
DSSM / MIND / DropoutNet / CoMetricLearningI2I / PDN | ||
W&D / DeepFM / MultiTower / DCN / DIN / BST | ||
MMoE / ESMM / DBMTL / PLE | ||
CMBF / 联合 | ||
|
||
易于定制¶ | ||
易于实现定制模型 | ||
无需关心数据管道 | ||
快速向量检索¶ | ||
在分布式环境中运行向量的 knn 算法 | ||
|
||
欢迎加入【EasyRec推荐算法交流群】,钉钉群号 : 32260796 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
机器学习PAI(Platform of Artificial Intelligence)是阿里云人工智能平台,提供一站式的机器学习解决方案。本文为您介绍什么是机器学习PAI。 | ||
|
||
什么是机器学习 | ||
机器学习是指机器通过统计学算法,对大量历史数据进行学习,进而利用生成的经验模型指导业务。目前机器学习主要应用在以下场景: | ||
营销类场景:商品推荐、用户群体画像或广告精准投放。 | ||
金融类场景:贷款发放预测、金融风险控制、股票走势预测或黄金价格预测。 | ||
社交网络服务关系挖掘场景:微博粉丝领袖分析或社交关系链分析。 | ||
文本类场景:新闻分类、关键词提取、文章摘要或文本内容分析。 | ||
非结构化数据处理场景:图片分类或图片文本内容提取。 | ||
其它各类预测场景:降雨预测或足球比赛结果预测。 | ||
机器学习包括传统机器学习和深度学习。传统机器学习分为以下几类: | ||
有监督学习(Supervised Learning):每个样本都有对应的期望值,通过搭建模型,实现从输入特征向量到目标值的映射。例如解决回归和分类问题。 | ||
无监督学习(Unsupervised Learning):所有样本没有目标值,期望从数据本身发现一些潜在规律。例如解决聚类问题。 | ||
增强学习(Reinforcement Learning):相对比较复杂,系统和外界环境不断交互,根据外界反馈决定自身行为,达到目标最优化。例如阿尔法围棋和无人驾驶。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,12 @@ | ||
experiment: | ||
# [custom knowledge dataset] | ||
- name: "exp1" | ||
data_path: "example_data/eval_docs" | ||
setting_file: "src/pai_rag/config/evaluation/settings_eval.toml" | ||
eval_data_path: "example_data/eval_docs_text" | ||
eval_model_source: "Dashscope" | ||
eval_model_name: "qwen-max" | ||
rag_setting_file: "src/pai_rag/config/evaluation/settings_eval_for_text.toml" | ||
- name: "exp2" | ||
data_path: "example_data/eval_docs_1" | ||
setting_file: "src/pai_rag/config/evaluation/settings_eval.toml" | ||
eval_data_path: "example_data/eval_docs_image" | ||
eval_model_source: "Dashscope" | ||
eval_model_name: "qwen-vl-max" | ||
rag_setting_file: "src/pai_rag/config/evaluation/settings_eval_for_image.toml" |
99 changes: 99 additions & 0 deletions
99
src/pai_rag/config/evaluation/settings_eval_for_image.toml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
dynaconf_merge = true | ||
|
||
[rag] | ||
name = "pai_rag" | ||
version = "0.1.1" | ||
|
||
[rag.agent] | ||
custom_agent_config_file = "" | ||
agent_tool_type = "" | ||
|
||
[rag.chat_store] | ||
type = "Local" # [Local, Aliyun-Redis] | ||
host = "Aliyun-Redis host" | ||
password = "Aliyun-Redis user:pwd" | ||
persist_path = "localdata/eval_exp_data/storage" | ||
|
||
[rag.data_analysis] | ||
type = "pandas" | ||
nl2sql_prompt = "给定一个输入问题,创建一个语法正确的{dialect}查询语句来执行,不要从特定的表中查询所有列,只根据问题查询几个相关的列。请注意只使用你在schema descriptions 中看到的列名。\n=====\n 小心不要查询不存在的列。请注意哪个列位于哪个表中。必要时,请使用表名限定列名。\n=====\n 你必须使用以下格式,每项占一行:\n\n Question: Question here\n SQLQuery: SQL Query to run \n\n Only use tables listed below.\n {schema}\n\n Question: {query_str} \n SQLQuery: " | ||
|
||
[rag.data_reader] | ||
type = "SimpleDirectoryReader" | ||
|
||
# embedding configurations, source support API: OpenAI,DashScope; and local model:HuggingFace | ||
# if use API, need set OPENAI_API_KEY or DASHSCOPE_API_KEY in ENV, If HuggingFace, need set model | ||
# eg. | ||
# source = "HuggingFace" | ||
# model = "bge-large-zh-v1.5" | ||
# embed_batch_size = 10 | ||
[rag.embedding] | ||
source = "DashScope" | ||
embed_batch_size = 10 | ||
|
||
[rag.index] | ||
persist_path = "localdata/eval_exp_data/storage" | ||
enable_multimodal = true | ||
vector_store.type = "FAISS" | ||
|
||
# llm configurations, source support API: OpenAI,DashScope or PAI-EAS's deployment | ||
# eg. | ||
# source = "PaiEas" | ||
# model = "" | ||
# endpoint = "" | ||
# token = "" | ||
[rag.llm] | ||
source = "DashScope" | ||
model = "qwen-turbo" | ||
|
||
[rag.multimodal_embedding] | ||
source = "cnclip" | ||
|
||
[rag.multimodal_llm] | ||
source = "dashscope" | ||
model = "qwen-vl-plus" | ||
|
||
[rag.node_enhancement] | ||
tree_depth = 3 | ||
max_clusters = 52 | ||
proba_threshold = 0.10 | ||
|
||
[rag.node_parser] | ||
type = "Sentence" | ||
chunk_size = 500 | ||
chunk_overlap = 10 | ||
enable_multimodal = true | ||
|
||
[rag.oss_store] | ||
bucket = "pai-rag" | ||
endpoint = "oss-cn-hangzhou.aliyuncs.com" | ||
prefix = "evaluation" | ||
|
||
[rag.postprocessor] | ||
reranker_type = "no-reranker" # [simple-weighted-reranker, model-based-reranker] | ||
reranker_model = "bge-reranker-base" # [bge-reranker-base, bge-reranker-large] | ||
keyword_weight = 0.3 | ||
vector_weight = 0.7 | ||
similarity_threshold = 0.5 | ||
top_n = 2 | ||
|
||
[rag.query_transform] | ||
type = "" | ||
|
||
[rag.retriever] | ||
similarity_top_k = 3 | ||
retrieval_mode = "hybrid" # [hybrid, embedding, keyword, router] | ||
query_rewrite_n = 1 # set to 1 to disable query generation | ||
search_image = true | ||
|
||
[rag.search] | ||
search_api_key = "" | ||
|
||
[rag.synthesizer] | ||
type = "SimpleSummarize" | ||
text_qa_template = "参考内容信息如下\n---------------------\n{context_str}\n---------------------根据提供内容而非其他知识回答问题.\n问题: {query_str}\n答案: \n" | ||
|
||
[rag.trace] | ||
type = "pai_trace" | ||
endpoint = "http://tracing-analysis-dc-hz.aliyuncs.com:8090" | ||
token = "" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.