Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(serde_with): make BytesOrString adjustable #793

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sivizius
Copy link
Contributor

At the moment, any Vec<u8> annotated with #[serde(as = "BytesOrString")] will always be serialised as an array of integers in the range 0–255. However, often these bytes are valid strings, the internal representation as Vec<u8> is often just an implementation detail. This PR adds a generic type parameter PREFERENCE, which must implement the new trait TypePreference, which is a subtrait of SerializeAs<[u8]>. This parameter defaults to the new marker-strut PreferBytes, which serialises Vec<u8> as an array like before. Two additional marker-structs are PreferString, which tries to convert &[u8] to &str first and will fallback to serialising as array only if the bytes do not represent a valid string, as well as PreferAsciiString, which additionally checks that all characters of the string are valid ASCII-characters, otherwise falling back to the array as well.

Copy link

codecov bot commented Oct 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.59%. Comparing base (2d70b70) to head (e030aa0).
Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #793      +/-   ##
==========================================
+ Coverage   67.34%   67.59%   +0.25%     
==========================================
  Files          40       40              
  Lines        2468     2509      +41     
==========================================
+ Hits         1662     1696      +34     
- Misses        806      813       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sivizius sivizius force-pushed the bytes-or-string-serialisation-preference branch from 1fc3306 to 554d127 Compare October 10, 2024 12:33
@sivizius sivizius force-pushed the bytes-or-string-serialisation-preference branch from 554d127 to e030aa0 Compare October 10, 2024 13:15
Copy link
Owner

@jonasbb jonasbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. This looks really good already :) The addition looks reasonable to me.

@@ -477,7 +477,7 @@ fn test_none_as_empty_string() {
}

#[test]
fn test_bytes_or_string() {
fn test_bytes_or_string_as_bytes() {
#[serde_as]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct S(#[serde_as(as = "BytesOrString")] Vec<u8>);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test for an explicit PreferBytes too?

/// When serializing a value of a type,
/// that allows multiple types during deserialization,
/// prefer a specific type.
pub trait TypePreference: SerializeAs<[u8]> {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's neat to just reuse the SerializeAs trait for the behavior.

Does the trait need to be fully public? We can't make it private, since it appears in public bounds, but seal all implementations. Similar to

mod sealed {
pub trait Sealed {}
impl Sealed for super::Standard {}
impl Sealed for super::UrlSafe {}
impl Sealed for super::Crypt {}
impl Sealed for super::Bcrypt {}
impl Sealed for super::ImapMutf7 {}
impl Sealed for super::BinHex {}
}
.

I would like to keep the serialization and deserialization behavior matching if possible. For that a fully public trait feels unnecessary. Having ASCII and unicode strings supported feels sufficient for now. If it turns out that more freedom is necessary, it can be unsealed later.

@jonasbb
Copy link
Owner

jonasbb commented Oct 10, 2024

Given this PR is #794 really necessary? The Bytes type does not really have any string association, but is primarily to enable calling the (potentially) optimized serialize_bytes function.

@sivizius
Copy link
Contributor Author

Given this PR is #794 really necessary? The Bytes type does not really have any string association, but is primarily to enable calling the (potentially) optimized serialize_bytes function.

It was more like an obvious addition to this PR. However, making this trait work for Bytes as well somewhat changed the serialisation behaviour of BytesOrString for the PreferBytes case: It will call serialize_bytes. Is this a breaking change?

@jonasbb
Copy link
Owner

jonasbb commented Oct 10, 2024

It will call serialize_bytes. Is this a breaking change?

Oh yeah, that is changing the behavior. In JSON it is not visible, as there is no efficient way to store binary data. There are some tests for Bytes using assert_tokens.

@sivizius
Copy link
Contributor Author

It will call serialize_bytes. Is this a breaking change?

Oh yeah, that is changing the behavior. In JSON it is not visible, as there is no efficient way to store binary data. There are some tests for Bytes using assert_tokens.

It’s not breaking for Bytes, I adjusted the SerdeAs<[u8]> implementation of PreferBytes to match the current behaviour. It has however changed it for BytesOrString though, which used source.serialize(serializer) before and uses serializer.serialize_bytes(source) now.

{
match core::str::from_utf8(source) {
Ok(text) if text.is_ascii() => serializer.serialize_str(text),
_ => serializer.serialize_bytes(source),
Copy link
Owner

@jonasbb jonasbb Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, these serialize_bytes should just be serialize.

Suggested change
_ => serializer.serialize_bytes(source),
_ => serializer.serialize(source),

where
S: Serializer,
{
serializer.serialize_bytes(source)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
serializer.serialize_bytes(source)
serializer.serialize(source)

{
match core::str::from_utf8(source) {
Ok(text) => serializer.serialize_str(text),
_ => serializer.serialize_bytes(source),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ => serializer.serialize_bytes(source),
_ => serializer.serialize(source),

@sivizius
Copy link
Contributor Author

The suggested changes (serialize_bytes → serialize) would break #794. However, as you do not seem to be convinced of that PR anyway and neither is that PR very important to me – it just seemed reasonable to me to apply this to Bytes as well, but I guess, closing it and just proceeding with this PR here is the way to go?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants