Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialization method lookup order in serializeUnknownRubyObject #25

Open
colinsurprenant opened this issue Jun 7, 2014 · 19 comments
Open

Comments

@colinsurprenant
Copy link
Collaborator

I am not sure I understand why the method to_json is not the first one looked up in serializeUnknownRubyObject - https://github.com/guyboertje/jrjackson/blob/master/src/main/java/com/jrjackson/RubyAnySerializer.java#L74

IMO it should be the first method tried to control the serialization behaviour for any class by implementing, overriding or monkey-patching to_json.

Is there a particular reason why it is the last one tried?

@guyboertje
Copy link
Owner

It is a last resort, by calling to_json, jrjackson releases control to the object author.

As the method is named to_json, the author is correct to return a JSON string representing the object maybe invoking a new JSON parser session or some, perhaps, sub-optimal mechanism.

So I first try to get a Time, Array, or Hash first.

What we really need is a to_json_data method that returns a String, Numeric, true, false, nil, Hash or Array representation of the object and allow JRuby and Jackson to handle the data structure. Further if the object is best represented as a JSON number and the author included methods to_i and to_f, jrjackson would be wrong to choose to_i when to_f would be most accurate and vice-versa, however with to_json_data, the author would supply the best representation.

For best JRuby performance the whole object graph passed to jrjackson would already be JSON compatible primitives meaning that the parser doesn't need to cross the JRuby/Java boundary repeatedly, i.e. needing to be aware of Ruby runtimes and ThreadContexts but, of course, this means that the object graph may be processed twice, once to convert in userland and once by Jackson in javaland.

Choosing a particular set of trade-offs in JrJackson is unlikely to be satisfactory for all users of this library.

Colin, I have always been impressed with your work and welcome any suggestions you have.

@colinsurprenant
Copy link
Collaborator Author

Thanks Guy, and thanks for JrJackson, it's really nice!

First: after giving more time understanding the serialization flow in JrJackson I realized my problem is not really related to the to_json calling order but really with java.util.Date serialization format; I'll open another issue or PR for this shortly. Nonetheless this is an interesting discussion so I'll give you my thoughts on this.

I understand your reasoning. In a performance big picture, optimizing for handling Jackson compatible primitives is ok but if it requires the traversal of the object graph in Ruby to convert to these primitives I think we loose.

IIUC to_json already acts as your proposed to_json_data no?

RubyObject obj = (RubyObject) method.call(ctx, rubyObject, meta, "to_json");
if (obj instanceof RubyString) {
    jgen.writeRawValue(obj.toString());
} else {
    provider.defaultSerializeValue(obj, jgen);
}

If the output of to_json is not a string then it delegates back to Jackson. So by returning a Jackson compatible primitives in to_json we would benefit from the Jackson speed and avoid traversing the graph in Ruby. I think this is good and I'd still argue to_json should be the first method tried but the problem I see is when you run it in an environment where all standard Ruby classes have been monkey patched with a to_json method like when you require the json gem, which will likely happen if you have a few dependencies. Maybe make this configurable and allow to specify an alternate to_json method to use which would have higher priority the actual to_json?

Thoughts?

@mkristian
Copy link
Contributor

On Sun, Jun 8, 2014 at 4:16 AM, Colin Surprenant [email protected]
wrote:

java.util.Date serialization format

I hope you guys are aware that you can "monkey patch" java classes in jruby
! for an example see
https://github.com/takari/maven-polyglot/blob/master/tesla-polyglot-ruby/src/main/resources/parser.rb#L49

just in case this is of any help for you ;)

@guyboertje
Copy link
Owner

I think the generally understood to_json behaviour is to return a JSON encoded string.

@guyboertje
Copy link
Owner

@mkristian, may be an interesting proposition.

@guyboertje
Copy link
Owner

Perhaps the method as_json_value() would convey the correct intent. A further problem is that, as part of MultiJson, JrJackson has a minimum API and behaviour it should provide. On the other hand, if JrJackson is used for daemons and other backend processes where the set of userland classes is small and in the author's control, this provides a force pulling in a different direction from the general purpose Railsy environment.

@nirvdrum
Copy link
Contributor

Not to complicate things, but Rails has an as_json method which takes some options such as which fields to include and evaluates as a hash to be JSON-encoded. as_json_value may be named sufficiently differently, but I could see potential confusion there.

@guyboertje
Copy link
Owner

I have no problem with as_json as long as its understood as 'return a json value representation of yourself' and not ruby Json.parse(to_h). It also seems that as_json walks the object tree. e.g.

Date.today.as_json ----> "2014-06-10"
{a: 33.4, t: Date.today}.as_json ----> {"a"=>33.4, "t"=>"2014-06-10"}
# I expected
{a: 33.4, t: Date.today}.as_json ----> {"a"=>33.4, "t"=> #<Date    >}

@colinsurprenant
Copy link
Collaborator Author

I personally don't like the as_json naming, I think it's confusing as the real meaning is more to_jsonable or maybe to_serializable. Also from what I understand the original intent in Rails was really to produce a json-serializable hash from a model, so in that context it makes sense that as_json applies to the full object tree. I think we shouldn't use as_json to avoid confusion with the Rails as_json semantic.

@guyboertje
Copy link
Owner

yeah, it concerns me too.

@guyboertje
Copy link
Owner

What's your use-case - Railsy or daemon?

@nirvdrum
Copy link
Contributor

Sorry. I think you misunderstood. I wasn't suggesting that it be named as_json. I was saying that as_json_value is very similar to as_json and may be confused by those familiar with Rails. It was meant as a caution.

@guyboertje
Copy link
Owner

@nirvdrum - good point. I guess the we are trying to resolve the opposing forces of Framework users with some custom classes, structs or Time-like objects ending up in the object tree and the Framework-free (daemon) users with almost all custom classes. Thing is all the Ruby Json parser authors would have to implement the call to to_json_object method and Rails would need to provide it etc. i dunno.

@colinsurprenant
Copy link
Collaborator Author

@guyboertje my use-case is non-rails.
@nirvdrum agree.

I'm not sure we need to worry too much about the other json parsers and rails. The way I see it, it is an optimization you can provide by implementing to_json_object(?) which will benefit JrJackson but will not break anything if you decide to change json parser, no?

@guyboertje
Copy link
Owner

@guyboertje
Copy link
Owner

@colinsurprenant, @nirvdrum, @mkristian, @headius, @jgwmaxwell:
What is the consensus on the name of a method I should encourage people to implement that converts a user or monkey patched ruby object to its (pure) JSON data based representation?

I have:

to_json_serializable
to_json_data
to_jsonable

@guyboertje
Copy link
Owner

In v 0.3.4 I have included support for to_json_data. If the Object author needs an alternative representation to be serialized then they should implement this method and return only JSON datatypes.

@headius
Copy link
Contributor

headius commented Jul 8, 2020

Auditing old issues that mention me...

Is there anything else to do here?

@Freaky
Copy link

Freaky commented Jan 30, 2021

Wasted a fair bit of time trying to work out why my ElasticSearch importer was breaking because of this - I have PostgreSQL array types which are behind Sequel's Array-delegating PGArray, which JrJackson helpfully tries to call to_h on - obviously failing because it isn't a set of nested pairs.

Personally I'd feel more comfortable without this to_h to_hash to_a magic - I'd think it perfectly reasonable to go off the fast-path on unknown types, if to_json_data isn't available, just call to_json.

This seems much safer and more predictable than blindly calling an ad-hoc grab-bag of conversion methods and hoping it produces something the user wanted - as noted, this would be a bad idea with something like to_i. I think it's a bad idea for other conversions as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants