Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toml 0.6.0 no longer allows deserializing a struct with an explicit lifetime #490

Closed
msfjarvis opened this issue Jan 24, 2023 · 17 comments
Closed
Labels
A-serde Area: Serde integration C-enhancement Category: Raise on the bar on expectations

Comments

@msfjarvis
Copy link

I've raised #489 as a minimal repro within the existing test suite

@epage
Copy link
Member

epage commented Jan 24, 2023

This was an intentional change as by using toml_edit for the parser as everything gets converted to an owned String before we deserialize, making it so borrows of the original source cannot happen.

@epage epage closed this as not planned Won't fix, can't repro, duplicate, stale Jan 24, 2023
@epage epage added C-bug Category: Things not working as expected A-serde Area: Serde integration labels Jan 24, 2023
@msfjarvis
Copy link
Author

This was an intentional change as by using toml_edit for the parser as everything gets converted to an owned String before we deserialize, making it so borrows of the original source cannot happen.

Got it. Is there a documentation change that can be made to make that explicit? I looked through the changelog but nothing stood out as the cause for this which is why I ended up raising this issue.

@epage
Copy link
Member

epage commented Jan 24, 2023

The changelog was updated.

@djc
Copy link

djc commented Jan 24, 2023

So why does toml_edit convert everything to String?

@epage
Copy link
Member

epage commented Jan 24, 2023

toml_edit parses directly into a Document.

In theory, we could add an internal ParsedDocument that gets converted into a Document or used for serde but that is a lot of extra code to maintain and a lot of extra steps and we need a strong enough case to justify it.

In theory, we could also parse into a Document<'de> where we store Cows for the strings but I've generally found putting Cow into API items adds a lot of end-user complexity and we need a strong enough case to justify it.

As an alternative to Cow, we could also parse into enum { Span, String }, like we do with Repr and Decor but that significantly complicated the API and for toml_edit users will require a lot of unnecessary unwraps, so this would also need a strong case to justify it.

@djc
Copy link

djc commented Jan 26, 2023

It seems quite surprising to me to just throw away the ability to zero-copy/low-allocation deserialization when that is one of the main benefits usually offered by serde integrations. Offering a Document<'de> with Cows seems pretty reasonable -- in my experience this doesn't really need to add a lot of end-user complexity.

@epage
Copy link
Member

epage commented Jan 26, 2023

It seems quite surprising to me to just throw away the ability to zero-copy/low-allocation deserialization when that is one of the main benefits usually offered by serde integrations.

Do you have a use case where the benefits of zero-copy / low allocation deserialization is important? Assuming its performance driven, what is the actual performance before/after this and what is your performance target and why is that performance target set?

@djc
Copy link

djc commented Jan 26, 2023

I don't currently have such a use case. My usage of toml that prompted these questions was for Askama's configuration file.

@AlexTMjugador
Copy link

For what is worth, I'd like to point out that serde_json's from_str just errors out on runtime if it turns out that the struct can't hold owned data because that lifetime is used in a reference like &'a str. serde_json is usually considered an example for serde deserializer and serializer design, although this potential runtime failure, which happens when serde_json unescapes strings, is also considered a footgun for the not-so-experienced.

@epage
Copy link
Member

epage commented Jan 31, 2023

The difference between serde_json and toml is that serde_json has a chance to reuse the borrowed data. This distinction is highlighted in the documentation on Deserializer lifetimes

@epage epage added C-enhancement Category: Raise on the bar on expectations and removed C-bug Category: Things not working as expected labels Feb 7, 2023
@epage epage pinned this issue Feb 8, 2023
@Yuri6037
Copy link

I just had to revert one of my projects to 0.5.11 to be able to deserialize without allocations.

@epage
Copy link
Member

epage commented Sep 18, 2024

I just had to revert one of my projects to 0.5.11 to be able to deserialize without allocations.

What is your use case that you are avoiding allocations?

@AlexTMjugador
Copy link

AlexTMjugador commented Sep 18, 2024

@Yuri6037 please note that due to what was stated on #505 and #490 (comment) no-allocation deserialization was not possible with the toml crate anyway, as deserializing e.g. a Cow field would always result in a Cow::Owned. In other words, previous toml crate versions at best provided some placebo that zero-copy deserialization was happening when in fact it was not.

Edit: I reviewed the old deserialization code for a bit and indeed it seems it had some logic for handling borrowed and owned strings separately, so it turns out that in previous versions zero-copy string deserialization may indeed have been possible. But with the current design it's indeed no longer possible, even if the DeserializeOwned constraint is relaxed.

@Yuri6037
Copy link

I just had to revert one of my projects to 0.5.11 to be able to deserialize without allocations.

What is your use case that you are avoiding allocations?

I'm trying to improve performance in one of my app which already does far too many allocations.

@Yuri6037
Copy link

@Yuri6037 please note that due to what was stated on #505 and #490 (comment) no-allocation deserialization was not possible with the toml crate anyway, as deserializing e.g. a Cow field would always result in a Cow::Owned. In other words, previous toml crate versions at best provided some placebo that zero-copy deserialization was happening when in fact it was not.

Edit: I reviewed the old deserialization code for a bit and indeed it seems it had some logic for handling borrowed and owned strings separately, so it turns out that in previous versions zero-copy string deserialization may indeed have been possible. But with the current design it's indeed no longer possible, even if the DeserializeOwned constraint is relaxed.

I'm not using a Cow, I use from_str with a &str and all models use &'a based fields.

@epage
Copy link
Member

epage commented Sep 18, 2024

I'm trying to improve performance in one of my app which already does far too many allocations.

Could you expand on your use case where TOML string allocations are a performance concern in your allocation?

For context on my perspective, I deal with Cargo's performance which for certain projects can load hundreds of TOML files. Even in that situation, TOML is a fraction of a fraction of where the end-user is going and only that is noticeable in no-op runs. The only reason I'm caring about no-op run performance is for shell completions calling into cargo. Like with Cargo, TOML is mostly dealt with in application initialization and most other applications load a fraction of the TOML files.

I'm not using a Cow, I use from_str with a &str and all models use &'a based fields.

Unless you are intentionally restricting your string inputs, this will reject perfectly valid TOML files because any escaped values will need an allocation.

@matthiasg
Copy link

matthiasg commented Dec 12, 2024

@epage as for a use case.

In my case I am deserializing an id. A type I also use in another type for json deserialization. Being able to work without copy there is very important (read millions of document ids parsed).

Thus the type is already DocumentId<'a> .. I wanted to also use that in toml, but cannot anymore.

My type does support DocumentId<'static> which is the 'owned' variant, but toml doesnt allow that either. Also the configuration should be usable from toml or json. Now the struct differs between the two, or i need to put a trait around it to convert.

In any case it was surprising to see

PS: I basically wrapped the structure now and build it internally to convert from String to my DocumentId

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-serde Area: Serde integration C-enhancement Category: Raise on the bar on expectations
Projects
None yet
Development

No branches or pull requests

6 participants