Skip to content

Commit

Permalink
fix(neo4j): remove embeddings from top_n lookup (#118)
Browse files Browse the repository at this point in the history
* fix: exclude embedding properties from top_n node query

* refactor: more ergonomic index creation

* docs(neo4j): update examples

* fix: unused import in example

* feat(provider): xAI (grok) integration (#106)

* feat(xai): initial xai (grok) implementation

* fix(xai): renamings + tests

* style(xai): Update rig-core/src/providers/xai/client.rs

Co-authored-by: Mathieu Bélanger <[email protected]>

* style(xai): adds various comments and README improvements

* fix(xai): add some print statements to the grok example

* docs(xai): fix readme

---------

Co-authored-by: Mathieu Bélanger <[email protected]>

* fix(rig-mongodb): remove embeddings from `top_n` lookup (#115)

* fix(mongodb): remove embeddings from `top_n` lookup

* fix(mongodb): filter embeddings within agg pipeline

* style(mongodb): clippy moment

* fix(mongodb): dynamically get embedded fields from mongodb

* fix(mongodb): apply fixes from comments

* style(mongodb): fmt

* docs(readme): add perplexity logo to integrations (#112)

* docs(readme): add perplexity logo to integrations
* fix: perplexity logo size

* fix(readme): perplexity logo size

* feat: embeddings API overhaul (#120)

* feat: setup derive macro

* test: test out writing embeddable macro

* test: continue testing custom macro implementation

* feat: macro generate trait bounds

* refactor: split up macro into multiple files

* refactor: move macro derive crate inside rig-core

* feat: replace embedding logic with new embeddable trait and macro

* refactor: refactor rag examples, delete document embedding struct

* feat: remove document embedding from in memory store

* refactor: remove DocumentEmbeddings from in memory vector store

* refactor(examples): combine vector store with vector store index

* docs: add and update docstrings

* fix (examples): fix bugs in examples

* style: cargo fmt

* revert: revert vector store to main

* docs: update emebddings builder docstrings

* refactor: derive macro

* tests: add unit tests on in memory store

* fic(ci): asterix on pull request sto accomodate for epic branches

* fix(ci): double asterix

* feat: add error type on embeddable trait

* refactor: move embeddings to its own module and seperate embeddable

* refactor: split up macro into more files, fix all imports

* fix: revert logging change

* feat: handle tools with embeddingsbuilder

* bug(macro): fix error when embed tags missing

* style: cargo fmt

* fix(tests): clippy

* docs&revert: revert embeddable trait error type, add docstrings

* style: cargo clippy

* clippy(lancedb): fix unused function error

* fix(test): remove useless assert false statement

* cleanup: split up branch into 2 branches for readability

* cleanup: revert certain changes during branch split

* docs: revert doc string

* fix: add embedding_docs to embeddable tool

* refactor: use OneOrMany in Embbedable trait, make derive macro crate feature flag

* tests: add some more tests

* clippy: cargo clippy

* docs: add docstring to oneormany

* fix(macro): update error handling

* refactor: reexport EmbeddingsBuilder in rig and update imports

* feat: implement IntoIterator and Iterator for OneOrMany

* refactor: rename from methods

* tests: fix failing tests

* refactor&fix: make PR review changes

* fix: fix tests failing

* test: add test on OneOrMany

* style: cargo fmt

* docs&fix: fix doc strings, implement iter_mut for OneOrMany

* fix: update borrow and owning of macro

* clippy: add back print statements

* fix: fix issues caused by merge of derive macro branch

* fix: fix cargo toml of lancedb and mongodb

* refactor: use thiserror for OneOtMany::EmptyListError

* feat: add OneOrMany to in memory vector store

* style: cargo fmt

* fix: update embeddingsbuilder import path

* tests: add tests for embeddingsbuilder

* clippy: add is empty method

* fix: add feature flag to examples in mongodb and lancedb crates

* fix: move lancedb fixtures into it's own file

* fix: add dummy main function in fextures.rs for compiler

* fix: revert fixture file, remove fixtures from cargo toml examples

* fix: update fixture import in lancedb examples

* refactor: rename D to T in embeddingsbuilder generics

* refactor: remove clone

* PR: update builder, docstrings, and std::markers tags

* style: replace add with push

* fix: fix mongodb example

* fix: update lancedb and mongodb doc example

* fix: typo

* docs: add and fix docstrings and examples

* docs: add more doc tests

* feat: rename Embeddable trait to ExtractEmbeddingFields

* feat: rename macro files, cargo fmt

* PR; update docstrings, update `add_documents_with_id` function

* doc: fix doc linting

* misc: fmt

* test: fix test

* refactor(embeddings): embed trait definition (#89)

* refactor: Big refactor

* refactor: refactor Embed trait, fix all imports, rename files, fix macro

* fix(embed trait): fix errors while testing

* fix(lancedb): examples

* docs: fix hyperlink

* fmt: cargo fmt

* PR; make requested changes

* fix: change visibility of struct field

* fix: failing tests

---------

Co-authored-by: Christophe <[email protected]>

* fix/docs: fix erros from merge, cleanup embeddings docstrings

* fix: cargo clippy in examples

* Feat: small improvements + fixes + tests (#128)

* docs: Make examples+docstrings a bit more realistic

* feat: Add Embed implementation for &impl Embed

* test: Reorganize tests

* misc: Add `derive` feature to `all` feature flag

* test: Fix dead code warning

* test: Improve embed macro tests

* test: Add additional embed macro test

* docs: Add logging output to rag example

* docs: Fix looging output in tools example

* feat: Improve token usage log messages

* test: Small changes to embedbing builder tests

* style: cargo fmt

* fix: Clippy + docstrings

* docs: Fix docstring

* test: Fix test

* style: Small renaming for consistency

* docs: Improve docstrings

* style: fmt

* fix: `TextEmbedder::embed` visibility

* docs: Simplified the `EmbeddingsBuilder` docstring example to focus on the builder

* style: cargo fmt

* docs: Small edit to lancedb examples

---------

Co-authored-by: cvauclair <[email protected]>

* misc: Add `rig-derive` missing manifest fields (#129)

* feat: Improve `InMemoryVectorStore` API (#130)

* feat: Improve `InMemoryVectorStore` API

* style: clippy+fmt

* test: fix test

* fix: remove unused module (#132)

* fix: exclude embedding properties from top_n node query

* refactor: more ergonomic index creation

* docs(neo4j): update examples

* fix: unused import in example

* fix(example): remove embedding field from Deserialization type

---------

Co-authored-by: Mochan <[email protected]>
Co-authored-by: Garance Buricatu <[email protected]>
Co-authored-by: cvauclair <[email protected]>
  • Loading branch information
4 people authored Dec 2, 2024
1 parent 4a22f96 commit 62c7bc5
Show file tree
Hide file tree
Showing 7 changed files with 287 additions and 161 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion rig-neo4j/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ anyhow = "1.0.86"
tokio = { version = "1.38.0", features = ["macros"] }
textwrap = { version = "0.16.1"}
term_size = { version = "0.3.2"}
tracing-subscriber = "0.3.18"

[[example]]
name = "vector_search_simple"
required-features = ["rig-core/derive"]
required-features = ["rig-core/derive"]
23 changes: 13 additions & 10 deletions rig-neo4j/examples/vector_search_movies_add_embeddings.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ struct Movie {
}

const NODE_LABEL: &str = "Movie";
const INDEX_NAME: &str = "moviePlots";

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
Expand Down Expand Up @@ -99,19 +100,21 @@ async fn main() -> Result<(), anyhow::Error> {
);
}

// // Select the embedding model and generate our embeddings
// Select the embedding model and generate our embeddings
let model = openai_client.embedding_model(TEXT_EMBEDDING_ADA_002);

// Create a vector index on our vector store
// Since we are starting from scratch, we need to create the DB vector index
neo4j_client
.create_vector_index(IndexConfig::new(INDEX_NAME), NODE_LABEL, &model)
.await?;

// ❗IMPORTANT: Reuse the same model that was used to generate the embeddings
let index = neo4j_client.index(
model,
IndexConfig::new("moviePlots"),
SearchParams::new(Some("node.year > 1990".to_string())),
);

index
.create_and_await_vector_index(NODE_LABEL.to_string(), None)
let index = neo4j_client
.get_index(
model,
INDEX_NAME,
SearchParams::new(Some("node.year > 1990".to_string())),
)
.await?;

// Query the index
Expand Down
34 changes: 18 additions & 16 deletions rig-neo4j/examples/vector_search_movies_consume.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,7 @@
//! [examples/vector_search_movies_add_embeddings.rs](examples/vector_search_movies_add_embeddings.rs) provides an example of
//! how to add embeddings to an existing `recommendations` database.
use neo4rs::ConfigBuilder;
use rig_neo4j::{
vector_index::{IndexConfig, SearchParams},
Neo4jClient,
};
use rig_neo4j::{vector_index::SearchParams, Neo4jClient};

use std::env;

Expand All @@ -32,20 +29,27 @@ mod display;

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
tracing_subscriber::fmt()
.with_max_level(tracing::Level::DEBUG)
.with_target(false)
.init();

const INDEX_NAME: &str = "moviePlotsEmbedding";

// Initialize OpenAI client
let openai_api_key = env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY not set");
let openai_client = Client::new(&openai_api_key);

let neo4j_uri = env::var("NEO4J_URI").expect("NEO4J_URI not set");
let neo4j_username = env::var("NEO4J_USERNAME").expect("NEO4J_USERNAME not set");
let neo4j_password = env::var("NEO4J_PASSWORD").expect("NEO4J_PASSWORD not set");
let neo4j_uri = "neo4j+s://demo.neo4jlabs.com:7687";
let neo4j_username = "recommendations";
let neo4j_password = "recommendations";

let neo4j_client = Neo4jClient::from_config(
ConfigBuilder::default()
.uri(neo4j_uri)
.user(neo4j_username)
.password(neo4j_password)
.db("neo4j")
.db("recommendations")
.build()
.unwrap(),
)
Expand All @@ -63,14 +67,12 @@ async fn main() -> Result<(), anyhow::Error> {

// Create a vector index on our vector store
// ❗IMPORTANT: Reuse the same model that was used to generate the embeddings
let index = neo4j_client.index(
model,
IndexConfig::default().index_name("moviePlots"),
SearchParams::new(Some("node.year > 1990".to_string())),
);

index
.create_and_await_vector_index("Movie".to_string(), None)
let index = neo4j_client
.get_index(
model,
INDEX_NAME,
SearchParams::new(Some("node.year > 1990".to_string())),
)
.await?;

// Query the index
Expand Down
33 changes: 13 additions & 20 deletions rig-neo4j/examples/vector_search_simple.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,7 @@ use rig::{
vector_store::VectorStoreIndex as _,
Embed,
};
use rig_neo4j::{
vector_index::{IndexConfig, SearchParams},
Neo4jClient, ToBoltType,
};
use rig_neo4j::{vector_index::SearchParams, Neo4jClient, ToBoltType};

#[derive(Embed, Clone, Debug)]
pub struct WordDefinition {
Expand Down Expand Up @@ -59,17 +56,6 @@ async fn main() -> Result<(), anyhow::Error> {
.build()
.await?;

// The struct that will reprensent a node in the database. Used to deserialize the results of the query (passed to the `top_n` methods)
// ❗IMPORTANT: The field names must match the property names in the database
#[derive(serde::Deserialize)]
struct Document {
#[allow(dead_code)]
id: String,
document: String,
#[allow(dead_code)]
embedding: Vec<f32>,
}

let create_nodes = futures::stream::iter(embeddings)
.map(|(doc, embeddings)| {
neo4j_client.graph.run(
Expand Down Expand Up @@ -130,11 +116,18 @@ async fn main() -> Result<(), anyhow::Error> {

// Create a vector index on our vector store
// IMPORTANT: Reuse the same model that was used to generate the embeddings
let index = neo4j_client.index(
model,
IndexConfig::new("vector_index"),
SearchParams::default(),
);
let index = neo4j_client
.get_index(model, "vector_index", SearchParams::default())
.await?;

// The struct that will reprensent a node in the database. Used to deserialize the results of the query (passed to the `top_n` methods)
// ❗IMPORTANT: The field names must match the property names in the database
#[derive(serde::Deserialize)]
struct Document {
#[allow(dead_code)]
id: String,
document: String,
}

// Query the index
let results = index
Expand Down
Loading

0 comments on commit 62c7bc5

Please sign in to comment.