-
Notifications
You must be signed in to change notification settings - Fork 117
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b063683
commit 2d8e5a2
Showing
14 changed files
with
83 additions
and
72 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,74 +1,81 @@ | ||
# Python Engine | ||
Enhancing the document involves improving clarity, structure, and adding helpful context where necessary. Here's an enhanced version: | ||
|
||
## Launch via Python | ||
# Python Engine Integration | ||
|
||
You can use in a async context with asyncio. | ||
This gives you most flexibility, but is a bit more advanced. | ||
## Launching Embedding generation with Python | ||
|
||
Use asynchronous programming in Python using `asyncio` for flexible and efficient embedding processing with Infinity. This advanced method allows for concurrent execution, making it ideal for high-throughput embedding generation. | ||
|
||
```python | ||
import asyncio | ||
from infinity_emb import AsyncEmbeddingEngine, EngineArgs | ||
|
||
sentences = [ | ||
"Embed this is sentence via Infinity.", | ||
"Paris is in France." | ||
] | ||
# Define sentences for embedding | ||
sentences = ["Embed this sentence via Infinity.", "Paris is in France."] | ||
# Initialize the embedding engine with model specifications | ||
engine = AsyncEmbeddingEngine.from_args( | ||
EngineArgs(model_name_or_path = "BAAI/bge-small-en-v1.5", engine="torch") | ||
EngineArgs(model_name_or_path="BAAI/bge-small-en-v1.5", engine="torch", | ||
lengths_via_tokenize=True | ||
) | ||
) | ||
|
||
async def main(): | ||
async with engine: | ||
# entering context: engine starts with engine.astart() | ||
embeddings, usage = await engine.embed( | ||
sentences=sentences) | ||
# engine stops with engine.astop() | ||
async with engine: # Context manager initializes and terminates the engine | ||
# usage is total token count according to tokenizer. | ||
embeddings, usage = await engine.embed(sentences=sentences) | ||
# Embeddings are now available for use | ||
asyncio.run(main()) | ||
``` | ||
|
||
# ReRanker | ||
## Reranker | ||
|
||
Reranking gives you a score for similarity between a query and multiple documents. | ||
Use it in conjunction with a VectorDB+Embeddings, or as standalone for small amount of documents. | ||
Please select a model from huggingface that is a AutoModelForSequenceClassification with one class classification. | ||
Enhance search results by reranking based on the similarity between a query and a set of documents. This feature is particularly useful in conjunction with vector databases and embeddings, or as a standalone solution for small datasets. Ensure you choose a Hugging Face model designed for sequence classification with a single output class, e.g. "BAAI/bge-reranker-base". Further models are usually listed as `rerank` models on HuggingFace https://huggingface.co/models?pipeline_tag=text-classification&sort=trending&search=rerank. | ||
|
||
```python | ||
import asyncio | ||
from infinity_emb import AsyncEmbeddingEngine, EngineArgs | ||
|
||
# Define your query and documents | ||
query = "What is the python package infinity_emb?" | ||
docs = [ | ||
"This is a document not related to the python package infinity_emb, hence...", | ||
"This document is unrelated to the python package infinity_emb.", | ||
"Paris is in France!", | ||
"infinity_emb is a package for sentence embeddings" | ||
"infinity_emb is a package for generating sentence embeddings." | ||
] | ||
engine_args = EngineArgs( | ||
model_name_or_path = "BAAI/bge-reranker-base", | ||
engine="torch") | ||
|
||
# Configure the reranking engine | ||
engine_args = EngineArgs(model_name_or_path="BAAI/bge-reranker-base", engine="torch") | ||
engine = AsyncEmbeddingEngine.from_args(engine_args) | ||
|
||
async def main(): | ||
async with engine: | ||
ranking, usage = await engine.rerank( | ||
query=query, docs=docs) | ||
ranking, usage = await engine.rerank(query=query, docs=docs) | ||
# Display ranked documents | ||
print(list(zip(ranking, docs))) | ||
asyncio.run(main()) | ||
``` | ||
|
||
# Text-Classification (Beta) | ||
|
||
## Text Classification (Beta) | ||
|
||
Explore text classification with Infinity's `classify` feature, which allows for sentiment analysis, emotion detection, and more classification tasks. Utilize pre-trained classification models on your text data. | ||
|
||
```python | ||
import asyncio | ||
from infinity_emb import AsyncEmbeddingEngine, EngineArgs | ||
|
||
# Example sentences for classification | ||
sentences = ["This is awesome.", "I am bored."] | ||
# Setup engine with text classification model | ||
engine_args = EngineArgs( | ||
model_name_or_path = "SamLowe/roberta-base-go_emotions", | ||
model_name_or_path="SamLowe/roberta-base-go_emotions", | ||
engine="torch", model_warmup=True) | ||
engine = AsyncEmbeddingEngine.from_args(engine_args) | ||
|
||
async def main(): | ||
async with engine: | ||
predictions, usage = await engine.classify( | ||
sentences=sentences) | ||
predictions, usage = await engine.classify(sentences=sentences) | ||
# Access classification predictions | ||
return predictions, usage | ||
asyncio.run(main()) | ||
``` | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters