Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EEP68 - JSON #59

Merged
merged 11 commits into from
Feb 26, 2024
Merged

EEP68 - JSON #59

merged 11 commits into from
Feb 26, 2024

Conversation

michalmuskala
Copy link
Contributor

@michalmuskala michalmuskala commented Feb 12, 2024

This EEP proposes adding a json module to the standard library

Rendered

eeps/eep-0068.md Outdated Show resolved Hide resolved
eeps/eep-0068.md Outdated Show resolved Hide resolved
eeps/eep-0068.md Outdated Show resolved Hide resolved
eeps/eep-0068.md Outdated Show resolved Hide resolved
eeps/eep-0068.md Outdated Show resolved Hide resolved
eeps/eep-0068.md Outdated Show resolved Hide resolved
michalmuskala and others added 2 commits February 12, 2024 10:20
eeps/eep-0068.md Outdated Show resolved Hide resolved
eeps/eep-0068.md Outdated Show resolved Hide resolved
eeps/eep-0068.md Outdated Show resolved Hide resolved
eeps/eep-0068.md Outdated Show resolved Hide resolved
Copy link

@dgud dgud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should NOT stack the accumulator for the user, and let the user handle that and also make the array_finish/1 and object_finish/1 return {Obj, NewAcc}.

As it is implemented it is not useful if I for example during parsing also want to count the
numbers of sharks in my array of aquariums.

I think that the Acc0 sent in is the one returned from decode/3 shows that the current proposal is strange.

I think I agree with @josevalim empty_array and empty_object,
from a user perspective it seems strange to handle empty objects separately, though
looking at the code I can understand why.

eeps/eep-0068.md Outdated Show resolved Hide resolved
@michalmuskala
Copy link
Contributor Author

@dgud yes, you're correct. I think my immediate concern was the overhead of the extra tuples each time we finalise a value - it does really add up. Let me try some benchmarks to see how much of an overhead that would be

@michalmuskala
Copy link
Contributor Author

I pushed an update that removes the _empty callbacks and updates the _finish callbacks to also receive the "old accumulator" (as was passed to _start), and expects them to return a tuple with the value and next accumulator - this allows threading a value in the accumulator throughout all the calls.
With some other optimisations the performance of the new encoder is still faster than other pure-Erlang implementations so I think it's worth it.

@garazdawi
Copy link
Contributor

When designing the API for logger filters we decided to not use a function(), instead we use {function(), term()} which makes it possible to use export funs as filters. The reason for this was two fold:

  1. It allows the code for the module the filter is in to be upgraded without having to re-configure logger with the new fun.
  2. It makes it possible to configure filters in sys.config files.

Do you think the same reasons apply to json? If so, should the encoder/decoder types be {function(),term()} instead of the current function()? Or should we also leave that as a possible extension?

The reason I started thinking about this is because I had a look at Jason and it supports a couple of different ways to escape strings (javascript, html, etc). The way to implement the same thing with EEP-68 json would be to implement an encoder callback for binary, but you would need to have different funs for each encoding type as you cannot pass any arguments to the encoder.

eeps/eep-0068.md Outdated Show resolved Hide resolved
@michalmuskala
Copy link
Contributor Author

michalmuskala commented Feb 22, 2024

@garazdawi How are the functions called then? Is is Fun(<current_args>, Term) or is it something like apply(Fun, CurrentArgs ++ Term)?
For the first case, we could consider it, however it would make the "simple case" more complex.
The reason why I'm hesitant about the second case is that with fixed number of arguments we can convince compiler in parse_decoder that all callbacks have the right number of arguments and thus emit the more efficient fun calls that don't check arguments inside of the main decoding loops.

As to the various escape functions in Jason - they're purpose is actually questionable, and I didn't have a clear goal with them, other than replicating the API of the older library Poison.

In particular the javascript case is no longer necessary as of 2019 since Javascript got explicitly re-defined as a subset of JSON in https://tc39.es/proposal-json-superset/
And for the html case it's actually not fully clear what it's supposed to protect against as the escaping seems to be not-enough and too-much at the same time - it was also broken for a really long time in Jason without any complaints (see michalmuskala/jason#109).
That's why this proposed implementation doesn't include either of those modes.

Co-authored-by: Lukas Larsson <[email protected]>
@garazdawi
Copy link
Contributor

@garazdawi How are the functions called then? Is is Fun(<current_args>, Term) or is it something like apply(Fun, CurrentArgs ++ Term)? For the first case, we could consider it, however it would make the "simple case" more complex. The reason why I'm hesitant about the second case is that with fixed number of arguments we can convince compiler in parse_decoder that all callbacks have the right number of arguments and thus emit the more efficient fun calls that don't check arguments inside of the main decoding loops.

They are called as the first example, I.e. fun(CurrentArgs, Term). But I think we should leave it as is, logger has an enough different use case to json that there is no reason to make things more complicated than they need to be.

I only mentioned the string escaping as an example of what you could want to do with the custom encoder interface. I don’t see any reason to add it to this module.

eeps/eep-0068.md Show resolved Hide resolved
@saleyn
Copy link
Contributor

saleyn commented Feb 22, 2024

Would the suggested implementation for the JSON encoding/decoding be done in Erlang or eventually NIF-based? What is the benefit of having this functionality to be a part of the Erlang distribution? There are already multiple good open-source JSON libraries available, and possibly reinventing the wheel might not be worth the effort. Being an author of one of such libraries, I might be biased. :)

@okeuday
Copy link

okeuday commented Feb 23, 2024

@saleyn The code should be more trustworthy if it is done in Erlang and not NIF-based, when considering the code initially added and its potential changes in the future. Erlang/OTP already provides xmerl for XML parsing and JSON has become more popular, so it makes sense to add JSON support in Erlang/OTP as a central place where the API can help people avoid mistakes like CVE-2017-12635.

@michalmuskala
Copy link
Contributor Author

michalmuskala commented Feb 23, 2024

The proposed API for both encoding and decoding includes callbacks - since it's not possible to call back to Erlang from NIFs, this can't be implemented as a pure NIF.

There's a pure Erlang implementation PR opened against the OTP erlang/otp#8111.

While the open-source implementations are great - this proposed one has even better performance and a significantly more flexible interface. It outperforms all pure Erlang implementations that I know of, and most of the time even the NIF-based jiffy (I failed to compile the simdjsone project).

It's possible to eventually have some subsets of the implementation as a NIF (e.g. string escaping), but this doesn't seem necessary at the moment.

Having it in standard distribution also makes it possible to use JSON in simple escripts (greatly enhancing the utility of Erlang as a scripting language) and makes it easy to use JSON in fundamental tooling like build systems (e.g. producing metadata about project layout for other tools like language servers), or in other parts of OTP itslef.

@okeuday
Copy link

okeuday commented Feb 23, 2024

@michalmuskala My memory of testing jiffy in the past was that it became slower when using the dirty thread pool, though it had the possibility of blocking VM scheduler threads if the JSON being parsed was too large (when not using the dirty thread pool). Avoiding performance and reliability problems like that with a pure Erlang implementation will be best.

@nixxquality
Copy link

I would like to submit this for your consideration: https://erlangforums.com/t/jsonpull-a-json-pull-parser-for-erlang/3306
If you have time to watch the original talk, please do, but otherwise I've already summed up my thoughts in the post above.

While this can lead to verbose code, it opens up a system where anything can be built on top, whether that is simple functions like my construct_list or even different kinds of parsers, like the ones already implemented here. Additionally, it could lead to third party libraries built on a standard one that immediately parses into expected types.

@kikofernandez kikofernandez merged commit 0bf6bbd into erlang:master Feb 26, 2024
1 check passed
@michalmuskala michalmuskala deleted the json-eep branch February 26, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.