-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Zero Copy String Deserialization #88
base: main
Are you sure you want to change the base?
Conversation
…dation on deserialization
Because Strings encoded in a proto message should be UTF8, I added a feature flag
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super good!
Saw that tests weren't passing - so poke at that a bit more.
@@ -9,7 +9,7 @@ publish = false | |||
|
|||
[dependencies] | |||
bytes = "0.5.6" | |||
pb-jelly = "0.0.5" | |||
pb-jelly = { path = "../pb-jelly" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#multiple-locations - you can actually specify both path and version
Actually, in this case, since examples aren't being posted to crates.io, we probably should keep it as only a path dependency.
@@ -1561,7 +1579,7 @@ def get_cargo_toml_file(self, derive_serde: bool) -> Iterator[Tuple[Text, Text]] | |||
features = {u"serde": u' features = ["serde_derive"]'} | |||
versions = { | |||
u"lazy_static": u' version = "1.4.0" ', | |||
u"pb-jelly": u' version = "0.0.5" ', | |||
u"pb-jelly": u' path = "../../../../../pb-jelly" ', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, we gotta find a better way to do the right thing here =\
Bazel and Spec.toml totally sidestep this issue - it's actually the entire point of Spec.toml - to template the location/version of the dependency.
This seems like a place where we'd accidentally have the wrong version provided - and it seems like our test suite will not actually be using the right pb-jelly dependency here, unless you make this change.
Maybe one idea - is that the version of pb-jelly-gen could get passed in as an argument to codegen.py - where our testsuite could inject ../../../../../pb-jelly. Then we're not hard coding a version number here.
Perhaps an idea for a separate issue?
# of bytes is UTF-8 on deserialization shouldn't be needed. As a precaution though, we do this | ||
# validation step. But validation can take a relatively large amount of time, so for users | ||
# who wish to skip this validation, we provide the `zero_copy_string_no_utf8_check` feature flag. | ||
zero_copy_string_no_utf8_check = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for cargo bench to test with and w/o this feature flag?
Seems like it might not be possible w/ just how cargo bench works.
I thought this was a pretty interesting task, adding zero copy de-serialization for Strings! Still a work in progress, but basically I created a type
StrBytes
which is a wrapper around aBytes
struct, and on creation we assert it's valid UTF-8, which it should be because based on the protobuf spec, strings are encoded in valid UTF-8.Benchmarks (2014 MacBook Pro with an i7)
Note: The reason zero copy strings are not as fast as zero copy bytes is because we do the extra validation step