Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H-3614: Introduce chonky PDF Embeddings #5673

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

H-3614: Introduce chonky PDF Embeddings #5673

wants to merge 11 commits into from

Conversation

JesusFileto
Copy link
Member

🌟 What is the purpose of this PR?

This PR outlines the work for the embeddings that will be done in the PDF and preliminary structs with how they will be stored. The main goal is to split embeddings into multiple levels of textual information such as table embeddings. Future PRs will focus on implementing the XML metadata that will accompany these embeddings to be used for entity extraction.

🚫 Blocked by

  • Requires Setup of HASH API keys for Google Vertex API and including the HuggingFaceToken
  • ...

🔍 What does this change?

  • ...

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

  • require changes to docs which are not made in this PR
    • Further modifications of current format for API calls and setup may change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

⚠️ Known issues

  • When first running the API calls for HuggingFace, the model is initially "cold" and takes 20 seconds to boot up the model for use. Will be implementing a redrive mechanism for this scenario later.

🐾 Next steps

  • Add XML Metadata from the pdf to further enrich the embeddings
  • Investigate how we wish to store all this data

🛡 What tests cover this?

  • New tests were added

@github-actions github-actions bot added area/deps Relates to third-party dependencies (area) area/libs Relates to first-party libraries/crates/packages (area) area/tests New or updated tests area/libs > chonky Affects the `chonky` crate (library) labels Nov 20, 2024
@CiaranMn CiaranMn requested a review from TimDiekmann November 20, 2024 09:47
Copy link
Member

@TimDiekmann TimDiekmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JesusFileto!
I noticed that you have used curl which means there is no asynchronous execution of requests possible. In particular for AI calls, asynchronous calls are preferred. In addition, curl pulls in libcurl (and OpenSSL), while reqwest is written in Rust and can use rustls.

As async backend we use tokio. If you add the "macros" and "rt-multi-thread" to tokio you can use #[tokio::main] on fn main() and #[tokio::test] instead of #[test] which makes the function async.
As this effectively makes the code async we can also use tokio to read files (by using the fs feature on tokio and tokio::fs instead of std::fs)

Cargo.toml Outdated
bumpalo = { version = "=3.16.0", default-features = false }
bytes = { version = "1.6.0" }
clap_builder = { version = "=4.5.21", default-features = false, features = ["std"] }
criterion = { version = "=0.5.1" }
curl = { version = "=0.4.47" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use reqwest instead. It's already configured in the root-manifest. You probably want to enable the json feature so we don't need to convert JSON to string first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok have pushed out some changes that do this, but unable to solve the tokio linter errors that occur from moves during the await function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the lints are saying that you cannot send PdfDocument across threads. This is because PDFium is not threadsafe. We basically have two options:

  • Don't pass PDFDocument around in async functions but instead read the PDF in the function itself (see their example about thread safety)
  • Don't care about multithreading here.

Basically this means we cannot send a document across threads in any world (which also would happen if you want to offload a task to a thread). However, this can be worked around by not sending the document but the API as a whole instead (Pdfium). This would allow to use the library in a multi-threaded environment, however, the PdfDocument needs to stay in a single thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I'd say we try not passing PdfDocument if not needed and ignore the lint #[expect(<lint name>, reason = "...")] if we cannot or do not want to avoid it.

libs/chonky/src/embedding.rs Outdated Show resolved Hide resolved
libs/chonky/src/embedding.rs Outdated Show resolved Hide resolved
@vilkinsons vilkinsons changed the title Introduce PDF Embeddings H-3614: Introduce chonky PDF Embeddings Nov 20, 2024
Copy link

codecov bot commented Nov 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 22.88%. Comparing base (e1741e5) to head (5b3ffcd).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5673      +/-   ##
==========================================
- Coverage   22.90%   22.88%   -0.03%     
==========================================
  Files         572      572              
  Lines       19389    19412      +23     
  Branches     2745     2752       +7     
==========================================
  Hits         4442     4442              
- Misses      14894    14917      +23     
  Partials       53       53              
Flag Coverage Δ
apps.hash-ai-worker-ts 1.32% <ø> (ø)
apps.hash-api 1.16% <ø> (ø)
local.harpc-client 59.02% <ø> (+0.03%) ⬆️
local.hash-backend-utils 8.81% <ø> (ø)
local.hash-graph-sdk 58.62% <ø> (ø)
local.hash-isomorphic-utils 0.98% <ø> (-0.02%) ⬇️
local.hash-subgraph 24.54% <ø> (ø)
rust.deer 6.66% <ø> (ø)
rust.error-stack 72.51% <ø> (ø)
rust.sarif 88.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@TimDiekmann TimDiekmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work moving to async 🎉
I did a more in-depth review. I think it's a good time to improve the code quality a bit 🙂

libs/chonky/src/embedding/hugging_face_api.rs Outdated Show resolved Hide resolved
.to_owned())
}

fn base64_json(image_data: Vec<u8>) -> Result<String, Report<ChonkyError>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed to pass a vector, a slice is enough

Suggested change
fn base64_json(image_data: Vec<u8>) -> Result<String, Report<ChonkyError>> {
fn base64_json(image_data: &[u8]) -> Result<String, Report<ChonkyError>> {

If you want to be even more generic you can use the same approach as above:

Suggested change
fn base64_json(image_data: Vec<u8>) -> Result<String, Report<ChonkyError>> {
fn base64_json(image_data: impl AsRef<[u8]>) -> Result<String, Report<ChonkyError>> {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, will add this in next PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second suggestion is a more flexible way without side effects (the calling site doesn't need to be updated).
However, you can change the caller to

- payload = base64_json(image_payload);
+ payload = base64_json(&image_payload);

libs/chonky/src/embedding/multi_modal_embedding.rs Outdated Show resolved Hide resolved
libs/chonky/src/embedding/multi_modal_embedding.rs Outdated Show resolved Hide resolved
libs/chonky/src/embedding/multi_modal_embedding.rs Outdated Show resolved Hide resolved
libs/chonky/src/embedding/multi_modal_embedding.rs Outdated Show resolved Hide resolved
libs/chonky/src/embedding/multi_modal_embedding.rs Outdated Show resolved Hide resolved
}

// Parses the response to extract the image embedding vector
fn extract_embedding(response_text: &str) -> Result<Vec<f64>, Report<ChonkyError>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a struct for embedding, such as TextEmbedding and ImageEmbedding we probably want to return an enum here

enum Embedding {
    Text(TextEmbedding),
    Image(ImageEmbedding),
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was implementing this but the ImageEmbedding struct requires the image in the struct, which is not accessible when just requesting the embedding, since the calling function should have knowledge on the type of vector it is receiving, it may be easier to not implement this (to avoid having to handling if let branches upstream where the compiler does not know what datatype we are expecting but we do)

Cargo.toml Outdated
bumpalo = { version = "=3.16.0", default-features = false }
bytes = { version = "1.6.0" }
clap_builder = { version = "=4.5.21", default-features = false, features = ["std"] }
criterion = { version = "=0.5.1" }
curl = { version = "=0.4.47" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the lints are saying that you cannot send PdfDocument across threads. This is because PDFium is not threadsafe. We basically have two options:

  • Don't pass PDFDocument around in async functions but instead read the PDF in the function itself (see their example about thread safety)
  • Don't care about multithreading here.

Basically this means we cannot send a document across threads in any world (which also would happen if you want to offload a task to a thread). However, this can be worked around by not sending the document but the API as a whole instead (Pdfium). This would allow to use the library in a multi-threaded environment, however, the PdfDocument needs to stay in a single thread.

Copy link
Member

@TimDiekmann TimDiekmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a re-review and the changes look very good. I added a few more comments.
Also, let's make sure to consistently use the new Embedding structs as they really contributes to the readability of the code.

Comment on lines +224 to +227
let xmin: i32 = num_traits::cast(bbox.xmin).ok_or(ChonkyError::Pdfium)?;
let ymin: i32 = num_traits::cast(bbox.ymin).ok_or(ChonkyError::Pdfium)?;
let xmax: i32 = num_traits::cast(bbox.xmax).ok_or(ChonkyError::Pdfium)?;
let ymax: i32 = num_traits::cast(bbox.ymax).ok_or(ChonkyError::Pdfium)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the exact behavior here we expect? Because num_traits::cast simply discards the decimals which is the behavior of floor. We should add to the comment that flooring it is the expected behavior.

Comment on lines 35 to 37
pub mod embedding;

pub use embedding::{hugging_face_api, multi_modal_embedding};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need both to be pub. Either is sufficient.


#[derive(Debug, Clone)]
pub struct Embedding {
_model_used: String, //model name reveals image or text embedding model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's often the case that the model is statically non

Suggested change
_model_used: String, //model name reveals image or text embedding model
_model_used: Cow<'static, str>, //model name reveals image or text embedding model

Also, is there a specific reason why you hide this detail and not making this pub?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason the field is not pub is since it is not used yet so clippy complains.

pub async fn embed_pdf_object_images(
pdf_image_extract: Vec<Vec<DynamicImage>>,
project_id: &str,
) -> Result<Vec<Vec<Vec<f64>>>, Report<ChonkyError>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely see the improvement already. I'll keep this comment open for the Box<[f64]> approach.

///
/// [`ChonkyError::VertexAPI`] when there are HTTP request errors
pub async fn embed_screenshots(
pdf_image_extract: Vec<DynamicImage>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual idiomatic version (but also quite hard to start with) Is using streams.
This way the function would become

pub fn embed_pdf_object_images(
    pdf_image_extract: impl IntoIterator<Item = PageImageObjects>,
    project_id: &str,
) -> impl Stream<Item = Result<PageImageObjectsEmbeddings, Report<ChonkyError>>> {
    stream::iter(pdf_image_extract)
        .then(|page_images| embed_screenshots(page_images, project_id))
        .map(|embeddings| {
            Ok(PageImageObjectsEmbeddings {
                _embeddings: embeddings?,
            })
        })
}

Note, that this function is not async, because it returns a Stream (which is pretty much an async iterator).
I don't say we need to take this route because streams are not yet well matured in Rust (and a more built-in feature is on the way), but it avoids allocations.

Comment on lines 108 to 111
pub async fn embed_tables(
pdf_table_bounds: Vec<Vec<ExtractedTable>>,
project_id: &str,
) -> Result<Vec<Vec<Vec<f64>>>, Report<ChonkyError>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we still have this signature 😅

///
/// [`ChonkyError::HuggingFaceAPI`] when there are HTTP request errors
pub async fn make_table_recognition_request(
image_path: impl AsRef<Path> + core::marker::Send + core::marker::Sync,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send and Sync are in the prelude.

Suggested change
image_path: impl AsRef<Path> + core::marker::Send + core::marker::Sync,
image_path: impl AsRef<Path> + Send + Sync,

Comment on lines 33 to 39
async fn get_binary_image_data(
image_path: impl AsRef<Path> + core::marker::Send + core::marker::Sync,
) -> Result<Vec<u8>, Report<ChonkyError>> {
fs::read(image_path)
.await
.change_context(ChonkyError::ImageError)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really doesn't more than reading a file. We should probably inline it. Also, the error ImageError seems to be inaccurate as this is really a file operation. (Btw, tokio::fs::read only wraps the reading operation in an another task, nothing else).

If we keep it, we should use Send only and directly:

Suggested change
async fn get_binary_image_data(
image_path: impl AsRef<Path> + core::marker::Send + core::marker::Sync,
) -> Result<Vec<u8>, Report<ChonkyError>> {
fs::read(image_path)
.await
.change_context(ChonkyError::ImageError)
}
async fn get_binary_image_data(
image_path: impl AsRef<Path> + Send,
) -> Result<Vec<u8>, Report<ChonkyError>> {
fs::read(image_path)
.await
.change_context(ChonkyError::ImageError)
}

Copy link
Member Author

@JesusFileto JesusFileto Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In trying to move the function inline, found that the explicit references to Send + Sync bounds avoid compiler errors when using references to image_path further downstream when retrying the request if the hugging face model was initially cold. For this reason it may be wise to not inline this function.

.to_owned())
}

fn base64_json(image_data: Vec<u8>) -> Result<String, Report<ChonkyError>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second suggestion is a more flexible way without side effects (the calling site doesn't need to be updated).
However, you can change the caller to

- payload = base64_json(image_payload);
+ payload = base64_json(&image_payload);

… `chonky` tests (#5687)

Co-authored-by: JesusFileto <[email protected]>
Co-authored-by: Jesus Fileto <[email protected]>
}

/// A function that performs authentication with Google Vertex API and performs
/// a curl request to obtain multimodal embeddings given an image path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// a curl request to obtain multimodal embeddings given an image path
/// a request to obtain multimodal embeddings given an image path

Copy link
Contributor

Benchmark results

@rust/hash-graph-benches – Integrations

representative_read_entity

Function Value Mean Flame graphs
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 $$16.5 \mathrm{ms} \pm 184 \mathrm{μs}\left({\color{lightgreen}-27.740 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 $$15.4 \mathrm{ms} \pm 192 \mathrm{μs}\left({\color{lightgreen}-7.297 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 $$16.2 \mathrm{ms} \pm 156 \mathrm{μs}\left({\color{gray}2.77 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 $$15.4 \mathrm{ms} \pm 161 \mathrm{μs}\left({\color{lightgreen}-10.024 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 $$16.4 \mathrm{ms} \pm 155 \mathrm{μs}\left({\color{gray}3.23 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 $$16.3 \mathrm{ms} \pm 160 \mathrm{μs}\left({\color{gray}-2.632 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 $$15.9 \mathrm{ms} \pm 190 \mathrm{μs}\left({\color{gray}-4.998 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 $$16.5 \mathrm{ms} \pm 169 \mathrm{μs}\left({\color{lightgreen}-28.667 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 $$16.0 \mathrm{ms} \pm 179 \mathrm{μs}\left({\color{gray}-2.222 \mathrm{\%}}\right) $$ Flame Graph

representative_read_multiple_entities

Function Value Mean Flame graphs
entity_by_property depths: DT=255, PT=255, ET=255, E=255 $$67.0 \mathrm{ms} \pm 516 \mathrm{μs}\left({\color{gray}2.58 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=0, ET=0, E=0 $$38.2 \mathrm{ms} \pm 248 \mathrm{μs}\left({\color{gray}-1.049 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=2, PT=2, ET=2, E=2 $$55.8 \mathrm{ms} \pm 363 \mathrm{μs}\left({\color{gray}-2.688 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=0, ET=0, E=2 $$42.0 \mathrm{ms} \pm 230 \mathrm{μs}\left({\color{gray}-2.245 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=0, ET=2, E=2 $$47.9 \mathrm{ms} \pm 386 \mathrm{μs}\left({\color{gray}-1.857 \mathrm{\%}}\right) $$ Flame Graph
entity_by_property depths: DT=0, PT=2, ET=2, E=2 $$53.1 \mathrm{ms} \pm 296 \mathrm{μs}\left({\color{gray}-1.300 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=255, PT=255, ET=255, E=255 $$109 \mathrm{ms} \pm 626 \mathrm{μs}\left({\color{gray}0.533 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=0 $$43.8 \mathrm{ms} \pm 230 \mathrm{μs}\left({\color{gray}3.98 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=2, PT=2, ET=2, E=2 $$99.8 \mathrm{ms} \pm 521 \mathrm{μs}\left({\color{gray}1.34 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=2 $$82.4 \mathrm{ms} \pm 743 \mathrm{μs}\left({\color{gray}0.041 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=0, ET=2, E=2 $$91.5 \mathrm{ms} \pm 582 \mathrm{μs}\left({\color{gray}-1.812 \mathrm{\%}}\right) $$ Flame Graph
link_by_source_by_property depths: DT=0, PT=2, ET=2, E=2 $$98.5 \mathrm{ms} \pm 508 \mathrm{μs}\left({\color{gray}0.509 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity_type

Function Value Mean Flame graphs
get_entity_type_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579 $$1.38 \mathrm{ms} \pm 3.62 \mathrm{μs}\left({\color{gray}-0.474 \mathrm{\%}}\right) $$ Flame Graph

scaling_read_entity_complete_one_depth

Function Value Mean Flame graphs
entity_by_id 50 entities $$270 \mathrm{ms} \pm 2.14 \mathrm{ms}\left({\color{gray}-0.130 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 5 entities $$26.5 \mathrm{ms} \pm 135 \mathrm{μs}\left({\color{gray}0.426 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1 entities $$20.1 \mathrm{ms} \pm 115 \mathrm{μs}\left({\color{gray}-1.302 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10 entities $$57.2 \mathrm{ms} \pm 221 \mathrm{μs}\left({\color{gray}-0.937 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 25 entities $$182 \mathrm{ms} \pm 1.32 \mathrm{ms}\left({\color{gray}2.95 \mathrm{\%}}\right) $$ Flame Graph

scaling_read_entity_linkless

Function Value Mean Flame graphs
entity_by_id 1 entities $$1.95 \mathrm{ms} \pm 10.2 \mathrm{μs}\left({\color{gray}1.34 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 100 entities $$2.13 \mathrm{ms} \pm 12.4 \mathrm{μs}\left({\color{gray}2.90 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10 entities $$1.94 \mathrm{ms} \pm 4.94 \mathrm{μs}\left({\color{gray}0.192 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1000 entities $$2.89 \mathrm{ms} \pm 20.8 \mathrm{μs}\left({\color{gray}0.585 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10000 entities $$9.59 \mathrm{ms} \pm 158 \mathrm{μs}\left({\color{lightgreen}-30.591 \mathrm{\%}}\right) $$ Flame Graph

scaling_read_entity_complete_zero_depth

Function Value Mean Flame graphs
entity_by_id 50 entities $$4.06 \mathrm{ms} \pm 36.8 \mathrm{μs}\left({\color{lightgreen}-5.508 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 5 entities $$1.96 \mathrm{ms} \pm 8.86 \mathrm{μs}\left({\color{gray}-0.141 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1 entities $$1.93 \mathrm{ms} \pm 9.14 \mathrm{μs}\left({\color{gray}-0.693 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10 entities $$2.17 \mathrm{ms} \pm 9.46 \mathrm{μs}\left({\color{gray}0.244 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 25 entities $$3.36 \mathrm{ms} \pm 14.8 \mathrm{μs}\left({\color{gray}-0.720 \mathrm{\%}}\right) $$ Flame Graph

@@ -92,7 +92,8 @@ ahash = { version = "=0.8.11", default-features = false }
ariadne = { version = "=0.5.0", default-features = false }
aws-types = { version = "=1.3.3", default-features = false }
axum = { version = "0.7.5" }
axum-core = { version = "0.4.5" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was by accident?

Comment on lines +1509 to +1522
[[package]]
name = "curl"
version = "0.4.47"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d9fb4d13a1be2b58f14d60adba57c9834b78c62fd86c3e76a148f732686e9265"
dependencies = [
"curl-sys",
"libc",
"openssl-probe",
"openssl-sys",
"schannel",
"socket2",
"windows-sys 0.52.0",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you need to update this file, it seems outdated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add json files to .gitattributes as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deps Relates to third-party dependencies (area) area/infra Relates to version control, CI, CD or IaC (area) area/libs > chonky Affects the `chonky` crate (library) area/libs Relates to first-party libraries/crates/packages (area) area/tests New or updated tests
Development

Successfully merging this pull request may close these issues.

3 participants