Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove local storage and enable Elasticsearch hybrid query mode #60

Merged
merged 21 commits into from
Jun 13, 2024

Conversation

moria97
Copy link
Collaborator

@moria97 moria97 commented Jun 12, 2024

  1. Remove unneeded local SimpleDirectoryStorage/SimpleIndexStorage for es/holo/adb/milvus
  2. Enable using keyword/embed/hybrid query mode directly with ElasticSearch

Copy link

github-actions bot commented Jun 12, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
3031 1815 60% 50% 🟢

New Files

File Coverage Status
src/pai_rag/modules/cache/oss_cache.py 93% 🟢
src/pai_rag/modules/datareader/data_loader.py 100% 🟢
src/pai_rag/modules/embedding/my_huggingface_embedding.py 58% 🟢
src/pai_rag/modules/index/index_utils.py 33% 🟢
src/pai_rag/modules/index/my_vector_store_index.py 78% 🟢
src/pai_rag/modules/retriever/my_elasticsearch_store.py 33% 🟢
src/pai_rag/utils/tokenizer.py 100% 🟢
TOTAL 71% 🟢

Modified Files

File Coverage Status
src/pai_rag/app/api/models.py 100% 🟢
src/pai_rag/core/rag_application.py 93% 🟢
src/pai_rag/data/rag_dataloader.py 66% 🟢
src/pai_rag/evaluations/batch_evaluator.py 79% 🟢
src/pai_rag/evaluations/dataset_generation/generate_dataset.py 77% 🟢
src/pai_rag/modules/init.py 100% 🟢
src/pai_rag/modules/base/configurable_module.py 87% 🟢
src/pai_rag/modules/chat/chat_engine_factory.py 88% 🟢
src/pai_rag/modules/chat/llm_chat_engine_factory.py 88% 🟢
src/pai_rag/modules/embedding/embedding.py 74% 🟢
src/pai_rag/modules/index/index.py 86% 🟢
src/pai_rag/modules/index/store.py 60% 🟢
src/pai_rag/modules/module_registry.py 98% 🟢
src/pai_rag/modules/retriever/my_vector_index_retriever.py 78% 🟢
src/pai_rag/modules/retriever/retriever.py 66% 🟢
src/pai_rag/utils/store_utils.py 49% 🟢
TOTAL 81% 🟢

updated for commit: 68263a1 by action🐍

@moria97 moria97 merged commit daba1f5 into feature Jun 13, 2024
1 check passed
moria97 added a commit that referenced this pull request Jun 14, 2024
* Add gpu dockerfile

* Fix bug

* Fix gb2312

* Update embedding batch size

* Set default embedding and llm model

* Update docker tag

* Fix hologres check

* Update registry

* Fix bug

* Fix tests

* Add queue

* Update batch size

* Add async interface

* Fix index conflict

* Add change index parameter for FAISS

* Fix batch size

* Update
@moria97 moria97 deleted the personal/yfei/fix-embed-llm branch July 30, 2024 06:36
moria97 added a commit that referenced this pull request Aug 5, 2024
* Bugfix: a case that files' encodings can not be detected by chardet (#61)

* Bugfix: connection error for longtime upload tasks (#62)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file: file_utils.py (#63)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file_utils

---------

Co-authored-by: Yue Fei <[email protected]>

* Remove local storage and enable Elasticsearch hybrid query mode (#60)

* Add gpu dockerfile

* Fix bug

* Fix gb2312

* Update embedding batch size

* Set default embedding and llm model

* Update docker tag

* Fix hologres check

* Update registry

* Fix bug

* Fix tests

* Add queue

* Update batch size

* Add async interface

* Fix index conflict

* Add change index parameter for FAISS

* Fix batch size

* Update

* Modify async upload to sync (#64)

* Modify async upload to sync

* fix failed test

* Fix faiss_path not effective in retrieval (#65)

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------

Co-authored-by: shubao.sx <[email protected]>
Co-authored-by: Yue Fei <[email protected]>

* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update readme and configuration (#77)

* fix demo.toml typo, and add comments for settings.toml for embedding

* update readme, add load data

* Update docker.yml

* Enable multiple workers to improve perf (#75)

* Add fast bm25

* Update

* Fix bug

* Fix bm25 bug

* Fix bug

* Refine code

* Update multi-process

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------

Co-authored-by: shubao.sx <[email protected]>
Co-authored-by: Yue Fei <[email protected]>

* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update dockerfile

* Fix bug

* Update

* Update docker file

* Fix empty file bug

* Fix local index error

* Fix lint

* Decouple gradio and backend

* Add ui build

* Add gunicorn

* Fix gunicorn

* Update nginx

* add nginx image

* Fix deployment issue

* Fix upload

---------

Co-authored-by: 筱文 <[email protected]>
Co-authored-by: paradiseHIT <[email protected]>
Co-authored-by: shubao.sx <[email protected]>

* Add guides for env and docker (#81)

* Add guides for env

* add guides for docker build

* Add README

* Add config guide cn&en (#82)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* Add doc reference for rag query (#84)

* Support evaluation for generated and open datasets (#83)

* Refactor evaluation module

* add UI: eval_tab

* support eval UI

* tmp eval

* remove eval web

* Support evaluation

* fix pytest

* Add OpenDataSet class

---------

Co-authored-by: ranxia <[email protected]>

* Fix oss url for miracl dataset (#86)

* fix ui es upload (#85)

* Fix eas LLM (#88)

* Milvus support sparse search (#87)

* Upload multiple files in single API call (#89)

* Milvus support sparse search

* aload fix

* Upload multiple files in one api call

* Remove notebooks

* Fix tests

* Fix http timeout issue

* Add client default timeout limitation and support UI interactive (#90)

* Add client default timeout limitation and support UI interactive

* support interactivate for vectordb type

* Fix ui issue (#91)

* Fix deps and add gpu ci tests (#92)

* Fix deps and add gpu ci tests

* Don't send report in 2nd pipeline

* Fix empty response for empty knowledge base (#93)

* Fix empty response for empty knowledge base

* Add constant for empty response message

* Fix dup nodes (#94)

* Add error handling (#96)

* Add error handling

* Add upload error msg

* fix data_loader (#95)

* fix data_loader

* fix data_loder

* fix data_loader

* fix data_loader

* Set proper log levels (#98)

* Adjust config instruction and add es instruction (#99)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* Log stacktrace for failed requests (#100)

* Load milvus collection by default (#101)

* Log stacktrace for failed requests

* Load milvus collection by default

* Rename & Relocate figures in md (#102)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* modify md figures

* minor modification

* change md path and name

* 针对windows平台修改docker启动命令 (#104)

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* make format

* make format, nothing changed

* download models from oss automatically (#97)

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from modelscope

* download models from modelscope

* fix readme

* Fix bug in downloading models (#106)

* Fix bug

* Fix log

* Fix download

* Add markdown reader (#105)

* fix pdf reader (#107)

Co-authored-by: Yue Fei <[email protected]>

* Personal/ranxia/pdf table summary fix (#109)

* fix pdf reader

* fix pdf reader table summary

---------

Co-authored-by: Yue Fei <[email protected]>

* FiAddage number to file_name (#110)

* Support stream response for LLM (PaiEAS && DashScope) (#112)

* Support stream response for LLM (PaiEAS && DashScope)

* Add PaiEas LLM old file

* Add image node processor (#114)

* Fit image in response

* Add image insert

* Fix llm max-token

* Fix bug (#115)

* Fix bugs for chinese escaped string in API header (#117)

* Fix bidi version (#119)

* Add fix version

* Update poetry.lock

* Update streaming response to body field use server sent events (#120)

* Fix streaming

* Fix llm and vector query

* Address comment

* Remove extra print

* Support simple-weighted-reranker and similarity-threshold (#116)

* Support nomalized cosine_sim score for different vectorDB

* Support simple-weighted-reranker and similarity-threshold

* [Todo] Support ES hybrid search

* Support Milvus

* fix path

* fix open dataset

* Fix url for du-retrieval dataset

* Restore setting

* Fix reviews

* Apply node_id for weighted_reranker

* jsonl reader (#124)

* jsonl reader

* jsonl reader

* Support function_calling with booking demo tools (#122)

* Add booking system demo for function_calling

* Support customized function calling tools

* Add testcase for agent and llm

* Fix test

* Fix async test

* Add readme for function calling

* Add readme for function calling

* Remove ref figs

* Add nodes enhancement by raptor (#111)

* add raptor

* add raptor ui support

* fix logger bug

* add node_enhancement class and modify test

* fix node_enhancement setting bug

* lint adjustment

* poetry lock

* fix poetry.lock

* fix poetry issues

* add a param

* add token calculation for Chinese and adjust context_window

* update tokenization_qwen

* update file_path

* merge feature and update poetry.lock

* exclude pytest since no vocab file in the test env

* exclude qwen.tiktoken

* delete assert

* Add weather tool (#125)

* weather okgit add .!

* fix bug

* space bug

---------

Co-authored-by: Yue Fei <[email protected]>

* Don't use parallel when data size is big (#108)

* Add opensearch (#127)

* Add open search. Not tested

* Fix

* Fix config

* update docker's readme (#126)

* update docker's readme

* change network back

* change network back

* change network back

* Create ci.yml (#131)

* Update CI & PR pipelines (#132)

* Update CI

* Fix ci

* Fix a few ui bugs (#133)

* Support RDS postgres vector store (#134)

* support rds postgers for store engine

* Format

* support table

* Make format

---------

Co-authored-by: Yue Fei <[email protected]>

* Fix minor bugs (#135)

* Fix bug

* Fix index bug

* Updaet password field

* Add pre-commit

* Remove upload button

* Refine upload

* Fix pg connection string

* Fix empty response for score_threshold (#136)

* Fix empty response for score_threshold

* Modify empty response info

* Modify empty response info

---------

Co-authored-by: Yue Fei <[email protected]>

* fix table_reader in pdf_reader (#128)

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* add "enable_ocr" and "enable_table_summary" (#138)

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* Add release pipeline and fix some bugs (#137)

* Fix bug

* Add release pipeline

* Update

* Update

* Fix bug

* Fix login

* Fix empty tag

* Update

* Fix ui issue

* Add base version tag

* Fix specific version

* Use pg hybrid retrieval directly

* Fix image tag

* Fix llm config (#139)

* Fix toml merge bug (#142)

* Fix configuration conflict (#143)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Fix space outage in github runner (#144)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Update yaml

---------

Co-authored-by: Ceceliachenen <[email protected]>
Co-authored-by: wwxxzz <[email protected]>
Co-authored-by: paradiseHIT <[email protected]>
Co-authored-by: shubao.sx <[email protected]>
Co-authored-by: aero-xi <[email protected]>
Co-authored-by: ranxia <[email protected]>
Co-authored-by: aero-xi <[email protected]>
Co-authored-by: CharlieKoo <[email protected]>
Co-authored-by: zhangdingchu <[email protected]>
Co-authored-by: zt2645802240 <[email protected]>
moria97 added a commit that referenced this pull request Aug 5, 2024
* Bugfix: a case that files' encodings can not be detected by chardet (#61)

* Bugfix: connection error for longtime upload tasks (#62)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file: file_utils.py (#63)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file_utils

---------



* Remove local storage and enable Elasticsearch hybrid query mode (#60)

* Add gpu dockerfile

* Fix bug

* Fix gb2312

* Update embedding batch size

* Set default embedding and llm model

* Update docker tag

* Fix hologres check

* Update registry

* Fix bug

* Fix tests

* Add queue

* Update batch size

* Add async interface

* Fix index conflict

* Add change index parameter for FAISS

* Fix batch size

* Update

* Modify async upload to sync (#64)

* Modify async upload to sync

* fix failed test

* Fix faiss_path not effective in retrieval (#65)

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------




* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update readme and configuration (#77)

* fix demo.toml typo, and add comments for settings.toml for embedding

* update readme, add load data

* Update docker.yml

* Enable multiple workers to improve perf (#75)

* Add fast bm25

* Update

* Fix bug

* Fix bm25 bug

* Fix bug

* Refine code

* Update multi-process

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------




* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update dockerfile

* Fix bug

* Update

* Update docker file

* Fix empty file bug

* Fix local index error

* Fix lint

* Decouple gradio and backend

* Add ui build

* Add gunicorn

* Fix gunicorn

* Update nginx

* add nginx image

* Fix deployment issue

* Fix upload

---------





* Add guides for env and docker (#81)

* Add guides for env

* add guides for docker build

* Add README

* Add config guide cn&en (#82)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* Add doc reference for rag query (#84)

* Support evaluation for generated and open datasets (#83)

* Refactor evaluation module

* add UI: eval_tab

* support eval UI

* tmp eval

* remove eval web

* Support evaluation

* fix pytest

* Add OpenDataSet class

---------



* Fix oss url for miracl dataset (#86)

* fix ui es upload (#85)

* Fix eas LLM (#88)

* Milvus support sparse search (#87)

* Upload multiple files in single API call (#89)

* Milvus support sparse search

* aload fix

* Upload multiple files in one api call

* Remove notebooks

* Fix tests

* Fix http timeout issue

* Add client default timeout limitation and support UI interactive (#90)

* Add client default timeout limitation and support UI interactive

* support interactivate for vectordb type

* Fix ui issue (#91)

* Fix deps and add gpu ci tests (#92)

* Fix deps and add gpu ci tests

* Don't send report in 2nd pipeline

* Fix empty response for empty knowledge base (#93)

* Fix empty response for empty knowledge base

* Add constant for empty response message

* Fix dup nodes (#94)

* Add error handling (#96)

* Add error handling

* Add upload error msg

* fix data_loader (#95)

* fix data_loader

* fix data_loder

* fix data_loader

* fix data_loader

* Set proper log levels (#98)

* Adjust config instruction and add es instruction (#99)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* Log stacktrace for failed requests (#100)

* Load milvus collection by default (#101)

* Log stacktrace for failed requests

* Load milvus collection by default

* Rename & Relocate figures in md (#102)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* modify md figures

* minor modification

* change md path and name

* 针对windows平台修改docker启动命令 (#104)

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* make format

* make format, nothing changed

* download models from oss automatically (#97)

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from modelscope

* download models from modelscope

* fix readme

* Fix bug in downloading models (#106)

* Fix bug

* Fix log

* Fix download

* Add markdown reader (#105)

* fix pdf reader (#107)



* Personal/ranxia/pdf table summary fix (#109)

* fix pdf reader

* fix pdf reader table summary

---------



* FiAddage number to file_name (#110)

* Support stream response for LLM (PaiEAS && DashScope) (#112)

* Support stream response for LLM (PaiEAS && DashScope)

* Add PaiEas LLM old file

* Add image node processor (#114)

* Fit image in response

* Add image insert

* Fix llm max-token

* Fix bug (#115)

* Fix bugs for chinese escaped string in API header (#117)

* Fix bidi version (#119)

* Add fix version

* Update poetry.lock

* Update streaming response to body field use server sent events (#120)

* Fix streaming

* Fix llm and vector query

* Address comment

* Remove extra print

* Support simple-weighted-reranker and similarity-threshold (#116)

* Support nomalized cosine_sim score for different vectorDB

* Support simple-weighted-reranker and similarity-threshold

* [Todo] Support ES hybrid search

* Support Milvus

* fix path

* fix open dataset

* Fix url for du-retrieval dataset

* Restore setting

* Fix reviews

* Apply node_id for weighted_reranker

* jsonl reader (#124)

* jsonl reader

* jsonl reader

* Support function_calling with booking demo tools (#122)

* Add booking system demo for function_calling

* Support customized function calling tools

* Add testcase for agent and llm

* Fix test

* Fix async test

* Add readme for function calling

* Add readme for function calling

* Remove ref figs

* Add nodes enhancement by raptor (#111)

* add raptor

* add raptor ui support

* fix logger bug

* add node_enhancement class and modify test

* fix node_enhancement setting bug

* lint adjustment

* poetry lock

* fix poetry.lock

* fix poetry issues

* add a param

* add token calculation for Chinese and adjust context_window

* update tokenization_qwen

* update file_path

* merge feature and update poetry.lock

* exclude pytest since no vocab file in the test env

* exclude qwen.tiktoken

* delete assert

* Add weather tool (#125)

* weather okgit add .!

* fix bug

* space bug

---------



* Don't use parallel when data size is big (#108)

* Add opensearch (#127)

* Add open search. Not tested

* Fix

* Fix config

* update docker's readme (#126)

* update docker's readme

* change network back

* change network back

* change network back

* Create ci.yml (#131)

* Update CI & PR pipelines (#132)

* Update CI

* Fix ci

* Fix a few ui bugs (#133)

* Support RDS postgres vector store (#134)

* support rds postgers for store engine

* Format

* support table

* Make format

---------



* Fix minor bugs (#135)

* Fix bug

* Fix index bug

* Updaet password field

* Add pre-commit

* Remove upload button

* Refine upload

* Fix pg connection string

* Fix empty response for score_threshold (#136)

* Fix empty response for score_threshold

* Modify empty response info

* Modify empty response info

---------



* fix table_reader in pdf_reader (#128)

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* add "enable_ocr" and "enable_table_summary" (#138)

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* Add release pipeline and fix some bugs (#137)

* Fix bug

* Add release pipeline

* Update

* Update

* Fix bug

* Fix login

* Fix empty tag

* Update

* Fix ui issue

* Add base version tag

* Fix specific version

* Use pg hybrid retrieval directly

* Fix image tag

* Fix llm config (#139)

* Fix toml merge bug (#142)

* Fix configuration conflict (#143)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Fix space outage in github runner (#144)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Update yaml

---------

Co-authored-by: Ceceliachenen <[email protected]>
Co-authored-by: wwxxzz <[email protected]>
Co-authored-by: paradiseHIT <[email protected]>
Co-authored-by: shubao.sx <[email protected]>
Co-authored-by: aero-xi <[email protected]>
Co-authored-by: ranxia <[email protected]>
Co-authored-by: aero-xi <[email protected]>
Co-authored-by: CharlieKoo <[email protected]>
Co-authored-by: zhangdingchu <[email protected]>
Co-authored-by: zt2645802240 <[email protected]>
moria97 added a commit that referenced this pull request Aug 5, 2024
* Bugfix: a case that files' encodings can not be detected by chardet (#61)

* Bugfix: connection error for longtime upload tasks (#62)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file: file_utils.py (#63)

* Fix connection error for longtime job

* fix testcase bugs

* support num workers for embedding model

* Refactor query api and add dataframe UI

* Refactor query api

* Remove embedding workers

* Add file_utils

---------

Co-authored-by: Yue Fei <[email protected]>

* Remove local storage and enable Elasticsearch hybrid query mode (#60)

* Add gpu dockerfile

* Fix bug

* Fix gb2312

* Update embedding batch size

* Set default embedding and llm model

* Update docker tag

* Fix hologres check

* Update registry

* Fix bug

* Fix tests

* Add queue

* Update batch size

* Add async interface

* Fix index conflict

* Add change index parameter for FAISS

* Fix batch size

* Update

* Modify async upload to sync (#64)

* Modify async upload to sync

* fix failed test

* Fix faiss_path not effective in retrieval (#65)

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------

Co-authored-by: shubao.sx <[email protected]>
Co-authored-by: Yue Fei <[email protected]>

* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update readme and configuration (#77)

* fix demo.toml typo, and add comments for settings.toml for embedding

* update readme, add load data

* Update docker.yml

* Enable multiple workers to improve perf (#75)

* Add fast bm25

* Update

* Fix bug

* Fix bm25 bug

* Fix bug

* Refine code

* Update multi-process

* Add API to support upload local files (#67)

* support upload file via API

* add Readme for upload API

* refactor query api

* modify load_knowledge with session_config

* use tempfile.mkdtemp() to store upload files

* add docker image timezone for China (#68)

* add image zone for China

* remove unused ENV

---------

Co-authored-by: shubao.sx <[email protected]>
Co-authored-by: Yue Fei <[email protected]>

* load data pipeline supports read config (#70)

* Add gpu docker image timezone for China (#74)

* Add fast bm25 (#66)

* Add fast bm25

* Fix bm25 bug

* Fix bug

* Fix test

* Update dockerfile

* Fix bug

* Update

* Update docker file

* Fix empty file bug

* Fix local index error

* Fix lint

* Decouple gradio and backend

* Add ui build

* Add gunicorn

* Fix gunicorn

* Update nginx

* add nginx image

* Fix deployment issue

* Fix upload

---------

Co-authored-by: 筱文 <[email protected]>
Co-authored-by: paradiseHIT <[email protected]>
Co-authored-by: shubao.sx <[email protected]>

* Add guides for env and docker (#81)

* Add guides for env

* add guides for docker build

* Add README

* Add config guide cn&en (#82)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* Add doc reference for rag query (#84)

* Support evaluation for generated and open datasets (#83)

* Refactor evaluation module

* add UI: eval_tab

* support eval UI

* tmp eval

* remove eval web

* Support evaluation

* fix pytest

* Add OpenDataSet class

---------

Co-authored-by: ranxia <[email protected]>

* Fix oss url for miracl dataset (#86)

* fix ui es upload (#85)

* Fix eas LLM (#88)

* Milvus support sparse search (#87)

* Upload multiple files in single API call (#89)

* Milvus support sparse search

* aload fix

* Upload multiple files in one api call

* Remove notebooks

* Fix tests

* Fix http timeout issue

* Add client default timeout limitation and support UI interactive (#90)

* Add client default timeout limitation and support UI interactive

* support interactivate for vectordb type

* Fix ui issue (#91)

* Fix deps and add gpu ci tests (#92)

* Fix deps and add gpu ci tests

* Don't send report in 2nd pipeline

* Fix empty response for empty knowledge base (#93)

* Fix empty response for empty knowledge base

* Add constant for empty response message

* Fix dup nodes (#94)

* Add error handling (#96)

* Add error handling

* Add upload error msg

* fix data_loader (#95)

* fix data_loader

* fix data_loder

* fix data_loader

* fix data_loader

* Set proper log levels (#98)

* Adjust config instruction and add es instruction (#99)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* Log stacktrace for failed requests (#100)

* Load milvus collection by default (#101)

* Log stacktrace for failed requests

* Load milvus collection by default

* Rename & Relocate figures in md (#102)

* add es setting

* add es setting

* add elasticsearch test

* add es test

* add and modify es_tokenizer test

* add and modify es_tokenizer test

* modify test_as_tokenizer

* add skipif

* fix test linter fails

* fix lint problem

* update test_as_analyzer

* add config_guide

* add navigation into readme

* adjust config guide and add es instruction

* modify md figures

* minor modification

* change md path and name

* 针对windows平台修改docker启动命令 (#104)

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* 针对windows平台修改docker启动命令

* make format

* make format, nothing changed

* download models from oss automatically (#97)

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from oss automatically

* download models from modelscope

* download models from modelscope

* fix readme

* Fix bug in downloading models (#106)

* Fix bug

* Fix log

* Fix download

* Add markdown reader (#105)

* fix pdf reader (#107)

Co-authored-by: Yue Fei <[email protected]>

* Personal/ranxia/pdf table summary fix (#109)

* fix pdf reader

* fix pdf reader table summary

---------

Co-authored-by: Yue Fei <[email protected]>

* FiAddage number to file_name (#110)

* Support stream response for LLM (PaiEAS && DashScope) (#112)

* Support stream response for LLM (PaiEAS && DashScope)

* Add PaiEas LLM old file

* Add image node processor (#114)

* Fit image in response

* Add image insert

* Fix llm max-token

* Fix bug (#115)

* Fix bugs for chinese escaped string in API header (#117)

* Fix bidi version (#119)

* Add fix version

* Update poetry.lock

* Update streaming response to body field use server sent events (#120)

* Fix streaming

* Fix llm and vector query

* Address comment

* Remove extra print

* Support simple-weighted-reranker and similarity-threshold (#116)

* Support nomalized cosine_sim score for different vectorDB

* Support simple-weighted-reranker and similarity-threshold

* [Todo] Support ES hybrid search

* Support Milvus

* fix path

* fix open dataset

* Fix url for du-retrieval dataset

* Restore setting

* Fix reviews

* Apply node_id for weighted_reranker

* jsonl reader (#124)

* jsonl reader

* jsonl reader

* Support function_calling with booking demo tools (#122)

* Add booking system demo for function_calling

* Support customized function calling tools

* Add testcase for agent and llm

* Fix test

* Fix async test

* Add readme for function calling

* Add readme for function calling

* Remove ref figs

* Add nodes enhancement by raptor (#111)

* add raptor

* add raptor ui support

* fix logger bug

* add node_enhancement class and modify test

* fix node_enhancement setting bug

* lint adjustment

* poetry lock

* fix poetry.lock

* fix poetry issues

* add a param

* add token calculation for Chinese and adjust context_window

* update tokenization_qwen

* update file_path

* merge feature and update poetry.lock

* exclude pytest since no vocab file in the test env

* exclude qwen.tiktoken

* delete assert

* Add weather tool (#125)

* weather okgit add .!

* fix bug

* space bug

---------

Co-authored-by: Yue Fei <[email protected]>

* Don't use parallel when data size is big (#108)

* Add opensearch (#127)

* Add open search. Not tested

* Fix

* Fix config

* update docker's readme (#126)

* update docker's readme

* change network back

* change network back

* change network back

* Create ci.yml (#131)

* Update CI & PR pipelines (#132)

* Update CI

* Fix ci

* Fix a few ui bugs (#133)

* Support RDS postgres vector store (#134)

* support rds postgers for store engine

* Format

* support table

* Make format

---------

Co-authored-by: Yue Fei <[email protected]>

* Fix minor bugs (#135)

* Fix bug

* Fix index bug

* Updaet password field

* Add pre-commit

* Remove upload button

* Refine upload

* Fix pg connection string

* Fix empty response for score_threshold (#136)

* Fix empty response for score_threshold

* Modify empty response info

* Modify empty response info

---------

Co-authored-by: Yue Fei <[email protected]>

* fix table_reader in pdf_reader (#128)

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* fix table_reader in pdf_reader

* add "enable_ocr" and "enable_table_summary" (#138)

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* add "enable_ocr" and "enable_table_summary"

* Add release pipeline and fix some bugs (#137)

* Fix bug

* Add release pipeline

* Update

* Update

* Fix bug

* Fix login

* Fix empty tag

* Update

* Fix ui issue

* Add base version tag

* Fix specific version

* Use pg hybrid retrieval directly

* Fix image tag

* Fix llm config (#139)

* Fix toml merge bug (#142)

* Fix configuration conflict (#143)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Fix space outage in github runner (#144)

* Fix merge bug

* Fix version conflict for config file

* Resolve snapshot merge conflict

* Update yaml

---------

Co-authored-by: Ceceliachenen <[email protected]>
Co-authored-by: wwxxzz <[email protected]>
Co-authored-by: paradiseHIT <[email protected]>
Co-authored-by: shubao.sx <[email protected]>
Co-authored-by: aero-xi <[email protected]>
Co-authored-by: ranxia <[email protected]>
Co-authored-by: aero-xi <[email protected]>
Co-authored-by: CharlieKoo <[email protected]>
Co-authored-by: zhangdingchu <[email protected]>
Co-authored-by: zt2645802240 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants