Proposed JSON Schema Interface #914

panasenco · 2021-04-25T05:20:42Z

panasenco
Apr 25, 2021

EDIT - PLEASE IGNORE THIS POST: My thinking on schemas has evolved a lot since making this post, but I'm not ready to propose any new ideas yet. Please ignore this post.

I'm trying to come up with the best way to reason about schemas in Prolog in order to implement JSON Schema as a Scryer library. Please let me know whether my current thinking below is on the right track.

I'm not happy with definitions of either serialization or schema that I've seen online, so I will define them in Prolog terms myself.

Serialization is a relation between an in-memory object and a byte stream.
Schema is a relation between two in-memory objects that have the same underlying data, but have different structure.

Serialization and schema can be treated independently. There's no reason why a JSON Schema can't describe the contents of a CSV file, or a Protobuf message definition can't describe the contents of an Apache Avro binary encoded message.

The below three predicates will be the interfaces for JSON serialization/deserialization both with and without schemas.

json_chars//1 (public) serializes and deserializes Prolog terms to and from JSON with full compliance to the official spec.
json_schema/3 (public) will relate three Prolog terms:
- Prolog term from json_chars//1 or some other source
- JSON Schema term obtained by reading a schema.json file using json_chars//1 or constructed manually.
- Prolog term transformed using the JSON schema.
json_chars//2 (public) will be a wrapper around the above two predicates for serializing/deserializing JSON with an explicit schema. It will be defined as something like this:
```
json_chars(Transformed, Schema) -->
        { json_schema(Raw, Schema, Transformed) },
        json_chars(Raw).
```

How this interface will work with some use cases:

Reading plain JSON (without explicit schema):

phrase_from_file(json_chars(Message), 'message.json').

Reading JSON with explicit JSON Schema:

phrase_from_file(json_chars(Schema), 'schema.json'),
phrase_from_file(json_chars(Message, Schema), 'message.json'),

Interoperability: Assuming that library(json) and library(csv) deserialize data into the exact same format of Prolog term (which is not currently the case), it would be possible to answer queries like "Can this data I got from a CSV file be serialized as valid JSON?" and "Does this CSV data conform to this particular JSON Schema?"
```
phrase_from_file(csv_chars(Csv), 'data.csv'),
phrase_from_file(json_chars(Schema), 'schema.json'),
json_schema(Csv, Schema, JsonSchemaValidatedCsv).
```

Schema Evolution: When your schema changes from V1 to V2, it will be possible to write custom Prolog code that relates V1 schema to V2 schema. There's an excellent open-source project called Cambria that does this in TypeScript, but if you ask me they just reinvented Prolog.

use_module('myschema_evolution.pl').
phrase_from_file(json_chars(SchemaV1), 'myschema_v1.json'),
phrase_from_file(json_chars(V1Json, SchemaV1), 'msg_v1.json'),
myschema_v1v2(V1Json, V2Json),
% Next two lines used to confirm that `myschema_v1v2/2` produced valid V2 message
phrase_from_file(json_chars(SchemaV2), 'myschema_v2.json'),
json_schema(_, SchemaV2, V2Json).

EricGT · 2021-04-25T11:09:05Z

EricGT
Apr 25, 2021

Long story short; when trying to debug real world problems with transferring data from one system to another, think different CPU architecture, OS, programming language, etc. there is no single debugger that can help you find the problem. Yes you can typically put a debugger on either end but I know of nothing that looks at both ends for all the different protocols simultaneously on both computers.

Now if the data is on a real world production system and is production data that can not be put on a dev machine and the production system is not allowed to have development tools on it, how do you identify the problem? The way to find the problems is to do it the old fashioned way and look at the data and hand walk the spec and the code. As such over the years people doing such work prefer that the data in the pipe be human readable and that there be a very rigid spec to eliminate possible ambiguities when creating the data to be put into the pipe.

In the old days making the data human readable was unthinkable because of the memory constraints and time constraints. Think of the days when computers had memory in the KBs and transmission speeds were in the kilobits per second. The more compressed the data the better, but even then compression algorithms were not a common standard. But now there is much more memory and faster transmission speeds so making it human readable is more advantageous. Also compressing the human readable data going into the pipe and uncompressing the data coming out of the pipe is now much more standardized on a few compression algorithms.

When you take into account the history and legacy of this it really does make sense as to where we are and how we arrived here.

1 reply

panasenco Apr 25, 2021
Author

Thanks for responding, @EricGT! I agree that how we arrived here makes sense, but I don't see how it relates to the point of my post - drawing a boundary between serialization logic and schema logic within Prolog code. Could you elaborate?

panasenco · 2021-04-25T21:33:02Z

panasenco
Apr 25, 2021
Author

In an earlier version of the original post that @EricGT responded to, I presented some wacky ideas about drawing a hard line between serialization and schema transformation. I've since realized these ideas don't make sense and aren't even necessary to consider when dealing with JSON schemas. I've edited the proposed interface in the original post to be much simpler and more intuitive.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed JSON Schema Interface #914

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Proposed JSON Schema Interface #914

panasenco Apr 25, 2021

Replies: 2 comments · 1 reply

EricGT Apr 25, 2021

panasenco Apr 25, 2021 Author

panasenco Apr 25, 2021 Author

panasenco
Apr 25, 2021

Replies: 2 comments 1 reply

EricGT
Apr 25, 2021

panasenco Apr 25, 2021
Author

panasenco
Apr 25, 2021
Author