fix(neo4j): remove embeddings from top_n lookup (#118)

* fix: exclude embedding properties from top_n node query * refactor: more ergonomic index creation * docs(neo4j): update examples * fix: unused import in example * feat(provider): xAI (grok) integration (#106) * feat(xai): initial xai (grok) implementation * fix(xai): renamings + tests * style(xai): Update rig-core/src/providers/xai/client.rs Co-authored-by: Mathieu Bélanger <[email protected]> * style(xai): adds various comments and README improvements * fix(xai): add some print statements to the grok example * docs(xai): fix readme --------- Co-authored-by: Mathieu Bélanger <[email protected]> * fix(rig-mongodb): remove embeddings from `top_n` lookup (#115) * fix(mongodb): remove embeddings from `top_n` lookup * fix(mongodb): filter embeddings within agg pipeline * style(mongodb): clippy moment * fix(mongodb): dynamically get embedded fields from mongodb * fix(mongodb): apply fixes from comments * style(mongodb): fmt * docs(readme): add perplexity logo to integrations (#112) * docs(readme): add perplexity logo to integrations * fix: perplexity logo size * fix(readme): perplexity logo size * feat: embeddings API overhaul (#120) * feat: setup derive macro * test: test out writing embeddable macro * test: continue testing custom macro implementation * feat: macro generate trait bounds * refactor: split up macro into multiple files * refactor: move macro derive crate inside rig-core * feat: replace embedding logic with new embeddable trait and macro * refactor: refactor rag examples, delete document embedding struct * feat: remove document embedding from in memory store * refactor: remove DocumentEmbeddings from in memory vector store * refactor(examples): combine vector store with vector store index * docs: add and update docstrings * fix (examples): fix bugs in examples * style: cargo fmt * revert: revert vector store to main * docs: update emebddings builder docstrings * refactor: derive macro * tests: add unit tests on in memory store * fic(ci): asterix on pull request sto accomodate for epic branches * fix(ci): double asterix * feat: add error type on embeddable trait * refactor: move embeddings to its own module and seperate embeddable * refactor: split up macro into more files, fix all imports * fix: revert logging change * feat: handle tools with embeddingsbuilder * bug(macro): fix error when embed tags missing * style: cargo fmt * fix(tests): clippy * docs&revert: revert embeddable trait error type, add docstrings * style: cargo clippy * clippy(lancedb): fix unused function error * fix(test): remove useless assert false statement * cleanup: split up branch into 2 branches for readability * cleanup: revert certain changes during branch split * docs: revert doc string * fix: add embedding_docs to embeddable tool * refactor: use OneOrMany in Embbedable trait, make derive macro crate feature flag * tests: add some more tests * clippy: cargo clippy * docs: add docstring to oneormany * fix(macro): update error handling * refactor: reexport EmbeddingsBuilder in rig and update imports * feat: implement IntoIterator and Iterator for OneOrMany * refactor: rename from methods * tests: fix failing tests * refactor&fix: make PR review changes * fix: fix tests failing * test: add test on OneOrMany * style: cargo fmt * docs&fix: fix doc strings, implement iter_mut for OneOrMany * fix: update borrow and owning of macro * clippy: add back print statements * fix: fix issues caused by merge of derive macro branch * fix: fix cargo toml of lancedb and mongodb * refactor: use thiserror for OneOtMany::EmptyListError * feat: add OneOrMany to in memory vector store * style: cargo fmt * fix: update embeddingsbuilder import path * tests: add tests for embeddingsbuilder * clippy: add is empty method * fix: add feature flag to examples in mongodb and lancedb crates * fix: move lancedb fixtures into it's own file * fix: add dummy main function in fextures.rs for compiler * fix: revert fixture file, remove fixtures from cargo toml examples * fix: update fixture import in lancedb examples * refactor: rename D to T in embeddingsbuilder generics * refactor: remove clone * PR: update builder, docstrings, and std::markers tags * style: replace add with push * fix: fix mongodb example * fix: update lancedb and mongodb doc example * fix: typo * docs: add and fix docstrings and examples * docs: add more doc tests * feat: rename Embeddable trait to ExtractEmbeddingFields * feat: rename macro files, cargo fmt * PR; update docstrings, update `add_documents_with_id` function * doc: fix doc linting * misc: fmt * test: fix test * refactor(embeddings): embed trait definition (#89) * refactor: Big refactor * refactor: refactor Embed trait, fix all imports, rename files, fix macro * fix(embed trait): fix errors while testing * fix(lancedb): examples * docs: fix hyperlink * fmt: cargo fmt * PR; make requested changes * fix: change visibility of struct field * fix: failing tests --------- Co-authored-by: Christophe <[email protected]> * fix/docs: fix erros from merge, cleanup embeddings docstrings * fix: cargo clippy in examples * Feat: small improvements + fixes + tests (#128) * docs: Make examples+docstrings a bit more realistic * feat: Add Embed implementation for &impl Embed * test: Reorganize tests * misc: Add `derive` feature to `all` feature flag * test: Fix dead code warning * test: Improve embed macro tests * test: Add additional embed macro test * docs: Add logging output to rag example * docs: Fix looging output in tools example * feat: Improve token usage log messages * test: Small changes to embedbing builder tests * style: cargo fmt * fix: Clippy + docstrings * docs: Fix docstring * test: Fix test * style: Small renaming for consistency * docs: Improve docstrings * style: fmt * fix: `TextEmbedder::embed` visibility * docs: Simplified the `EmbeddingsBuilder` docstring example to focus on the builder * style: cargo fmt * docs: Small edit to lancedb examples --------- Co-authored-by: cvauclair <[email protected]> * misc: Add `rig-derive` missing manifest fields (#129) * feat: Improve `InMemoryVectorStore` API (#130) * feat: Improve `InMemoryVectorStore` API * style: clippy+fmt * test: fix test * fix: remove unused module (#132) * fix: exclude embedding properties from top_n node query * refactor: more ergonomic index creation * docs(neo4j): update examples * fix: unused import in example * fix(example): remove embedding field from Deserialization type --------- Co-authored-by: Mochan <[email protected]> Co-authored-by: Garance Buricatu <[email protected]> Co-authored-by: cvauclair <[email protected]>
0xPlaygrounds · Dec 2, 2024 · 62c7bc5 · 62c7bc5
1 parent 4a22f96
commit 62c7bc5
Show file tree

Hide file tree

Showing 7 changed files with 287 additions and 161 deletions.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/rig-neo4j/Cargo.toml b/rig-neo4j/Cargo.toml
@@ -22,7 +22,8 @@ anyhow = "1.0.86"
 tokio = { version = "1.38.0", features = ["macros"] }
 textwrap = { version = "0.16.1"}
 term_size = { version = "0.3.2"}
+tracing-subscriber = "0.3.18"
 
 [[example]]
 name = "vector_search_simple"
-required-features = ["rig-core/derive"]
+required-features = ["rig-core/derive"]
diff --git a/rig-neo4j/examples/vector_search_movies_add_embeddings.rs b/rig-neo4j/examples/vector_search_movies_add_embeddings.rs
@@ -30,6 +30,7 @@ struct Movie {
 }
 
 const NODE_LABEL: &str = "Movie";
+const INDEX_NAME: &str = "moviePlots";
 
 #[tokio::main]
 async fn main() -> Result<(), anyhow::Error> {
@@ -99,19 +100,21 @@ async fn main() -> Result<(), anyhow::Error> {
         );
     }
 
-    // // Select the embedding model and generate our embeddings
+    // Select the embedding model and generate our embeddings
     let model = openai_client.embedding_model(TEXT_EMBEDDING_ADA_002);
 
-    // Create a vector index on our vector store
+    // Since we are starting from scratch, we need to create the DB vector index
+    neo4j_client
+        .create_vector_index(IndexConfig::new(INDEX_NAME), NODE_LABEL, &model)
+        .await?;
+
     // ❗IMPORTANT: Reuse the same model that was used to generate the embeddings
-    let index = neo4j_client.index(
-        model,
-        IndexConfig::new("moviePlots"),
-        SearchParams::new(Some("node.year > 1990".to_string())),
-    );
-
-    index
-        .create_and_await_vector_index(NODE_LABEL.to_string(), None)
+    let index = neo4j_client
+        .get_index(
+            model,
+            INDEX_NAME,
+            SearchParams::new(Some("node.year > 1990".to_string())),
+        )
         .await?;
 
     // Query the index

diff --git a/rig-neo4j/examples/vector_search_movies_consume.rs b/rig-neo4j/examples/vector_search_movies_consume.rs
@@ -14,10 +14,7 @@
 //! [examples/vector_search_movies_add_embeddings.rs](examples/vector_search_movies_add_embeddings.rs) provides an example of
 //! how to add embeddings to an existing `recommendations` database.
 use neo4rs::ConfigBuilder;
-use rig_neo4j::{
-    vector_index::{IndexConfig, SearchParams},
-    Neo4jClient,
-};
+use rig_neo4j::{vector_index::SearchParams, Neo4jClient};
 
 use std::env;
 
@@ -32,20 +29,27 @@ mod display;
 
 #[tokio::main]
 async fn main() -> Result<(), anyhow::Error> {
+    tracing_subscriber::fmt()
+        .with_max_level(tracing::Level::DEBUG)
+        .with_target(false)
+        .init();
+
+    const INDEX_NAME: &str = "moviePlotsEmbedding";
+
     // Initialize OpenAI client
     let openai_api_key = env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY not set");
     let openai_client = Client::new(&openai_api_key);
 
-    let neo4j_uri = env::var("NEO4J_URI").expect("NEO4J_URI not set");
-    let neo4j_username = env::var("NEO4J_USERNAME").expect("NEO4J_USERNAME not set");
-    let neo4j_password = env::var("NEO4J_PASSWORD").expect("NEO4J_PASSWORD not set");
+    let neo4j_uri = "neo4j+s://demo.neo4jlabs.com:7687";
+    let neo4j_username = "recommendations";
+    let neo4j_password = "recommendations";
 
     let neo4j_client = Neo4jClient::from_config(
         ConfigBuilder::default()
             .uri(neo4j_uri)
             .user(neo4j_username)
             .password(neo4j_password)
-            .db("neo4j")
+            .db("recommendations")
             .build()
             .unwrap(),
     )
@@ -63,14 +67,12 @@ async fn main() -> Result<(), anyhow::Error> {
 
     // Create a vector index on our vector store
     // ❗IMPORTANT: Reuse the same model that was used to generate the embeddings
-    let index = neo4j_client.index(
-        model,
-        IndexConfig::default().index_name("moviePlots"),
-        SearchParams::new(Some("node.year > 1990".to_string())),
-    );
-
-    index
-        .create_and_await_vector_index("Movie".to_string(), None)
+    let index = neo4j_client
+        .get_index(
+            model,
+            INDEX_NAME,
+            SearchParams::new(Some("node.year > 1990".to_string())),
+        )
         .await?;
 
     // Query the index

diff --git a/rig-neo4j/examples/vector_search_simple.rs b/rig-neo4j/examples/vector_search_simple.rs
@@ -15,10 +15,7 @@ use rig::{
     vector_store::VectorStoreIndex as _,
     Embed,
 };
-use rig_neo4j::{
-    vector_index::{IndexConfig, SearchParams},
-    Neo4jClient, ToBoltType,
-};
+use rig_neo4j::{vector_index::SearchParams, Neo4jClient, ToBoltType};
 
 #[derive(Embed, Clone, Debug)]
 pub struct WordDefinition {
@@ -59,17 +56,6 @@ async fn main() -> Result<(), anyhow::Error> {
         .build()
         .await?;
 
-    // The struct that will reprensent a node in the database. Used to deserialize the results of the query (passed to the `top_n` methods)
-    // ❗IMPORTANT: The field names must match the property names in the database
-    #[derive(serde::Deserialize)]
-    struct Document {
-        #[allow(dead_code)]
-        id: String,
-        document: String,
-        #[allow(dead_code)]
-        embedding: Vec<f32>,
-    }
-
     let create_nodes = futures::stream::iter(embeddings)
         .map(|(doc, embeddings)| {
             neo4j_client.graph.run(
@@ -130,11 +116,18 @@ async fn main() -> Result<(), anyhow::Error> {
 
     // Create a vector index on our vector store
     // IMPORTANT: Reuse the same model that was used to generate the embeddings
-    let index = neo4j_client.index(
-        model,
-        IndexConfig::new("vector_index"),
-        SearchParams::default(),
-    );
+    let index = neo4j_client
+        .get_index(model, "vector_index", SearchParams::default())
+        .await?;
+
+    // The struct that will reprensent a node in the database. Used to deserialize the results of the query (passed to the `top_n` methods)
+    // ❗IMPORTANT: The field names must match the property names in the database
+    #[derive(serde::Deserialize)]
+    struct Document {
+        #[allow(dead_code)]
+        id: String,
+        document: String,
+    }
 
     // Query the index
     let results = index