-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Bindgen to simpler form #18
Conversation
d74b3a0
to
29c9f87
Compare
60deead
to
daf9952
Compare
Refactoring the bindgen with the goal of having a simpler IR-like form that makes recursively generating types and code easier. The end goal is to remove all the hairy type portions of the XTP Schema document. e.g.: * type * format * items * $ref * additionalProperties and replace with a recursively defined type `XtpNormalizedSchema` (name will likely change to XtpType). I've kept things backwards compatible by only adding this type on the property `xtpType`. We should switch bindgens to use this object then we can remove the other properties. I also decided to change a little bit about the whole process. e.g.: * moved validation to the parser level code * remove circularReference detection and blocking * added some more tests In order to think more clearly about the whole process, I took up a moment to write up what I think the whole flow for schemas should be in terms of validating and compiling: First we validate against JSON schema. This should ensure that we can parse the document into typescript types in the next step. It should not allow any extra values or any values outside of the enum ranges. We should be able to do a simple `const doc = rawDoc as V1Schema` in typescript and the doc object should be valid. Here we parse the json or yaml into a raw javascript object and cast it to the V1 or V0 schema type. This gives us a raw, but typed representation of the schema. Here we should do extra conditional validation steps that can’t be done (or are too complex to be done) in the JSON Schema. e.g. validating content-type / type pairs, etc. Here we take the raw parsed types and “normalize” them into a simpler form. We will walk the document and replace all occurrences of $ref, items, additionalProperties, type, and format with a single recursive XtpType.
daf9952
to
cfc35d7
Compare
super(message); | ||
Object.setPrototypeOf(this, ValidationError.prototype); | ||
} | ||
export class ValidationError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to throw exceptions and stop the process, we want to collect all validation errors giving the user the chance to address them all in one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love this!
541037a
to
75e60d0
Compare
function isDateTime(p: Property | Parameter | null): boolean { | ||
if (!p) return false; | ||
return p.type === "string" && p.format === "date-time"; | ||
export type XtpTyped = { xtpType: XtpNormalizedType }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an external convenience type for any node in the doc that should have an xtp type. think schema, property, parameter, items, additionalProperties, etc.
return p.type === "string" && p.format === "date-time"; | ||
export type XtpTyped = { xtpType: XtpNormalizedType }; | ||
|
||
function isDateTime(p: XtpTyped): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are helpers to quickly check the type of a node without drilling into the xtpType.kind. Will work on any XtpTyped object.
*/ | ||
import { ValidationError } from "./common" | ||
|
||
export interface ParseResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now returning this result object which may have errors
constructor(doc: V1Schema) { | ||
this.doc = doc as any | ||
this.errors = [] | ||
this.location = ['#'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of hand building the location we can just track it as we walk the nodes by pushing and popping from this array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot – it's an elegant abstraction!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful!
75e60d0
to
15bec16
Compare
src/parser.ts
Outdated
if (prop.additionalProperties) { | ||
this.validateTypedInterface(prop.additionalProperties) | ||
|
||
// here we are adding some extra constraints on the value type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just above this we validate that additionalProperties meets XtpTyped rules. but here we are adding some extra constraints because we don't want to support recursive value types here just yet. We have done this to arrays as well. note: Refs are supported because they are easier to generate casting code for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ha – so to confirm my understanding, is it the trickiness of writing recursive casting code that's steering us away from supporting recursive types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least in items and additionalProperties. yes. but when i get that working i think we can delete these extra validations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up backing this change out. Though if it seems too complex in other bindgens then maybe we put it back.
} | ||
|
||
export type XtpSchemaType = 'object' | 'enum' | 'map' | ||
// TODO this figure out how to split up type again? | ||
//export type XtpSchemaType = 'object' | 'enum' | 'map' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really need this type anymore. we can now put any type into a schema. though maybe we should hold off on relaxing the constraints in the json schema until we can test. In theory we should be able to support enums and maps as properties too (inline definition) but we'll want to wait to do that and will need to add a requirement that they have name.
export type XtpFormat = | ||
'int32' | 'int64' | 'float' | 'double' | 'date-time' | 'byte'; | ||
|
||
export interface XtpItemType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really just another XtpTyped. items and addtionalProperties are no different (from a type perspective) than a property, parameter, etc
* We will normalize the raw XTP schema into these recursively defined types. | ||
*/ | ||
|
||
export type XtpNormalizedKind = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll likely change the name of this. This is just so we don't conflict with XtpType.
this.validateTypedInterface(node) | ||
|
||
if (node && typeof node === 'object') { | ||
// i don't think we need to validate array children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe arrays are effectively terminal in our document. this could change though.
|
||
recordError(msg: string) { | ||
this.errors.push( | ||
new ValidationError(msg, this.getLocation()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling recordError
uses our current location
* Validates that a node conforms to the rules of | ||
* the XtpTyped interface. Validates what we can't | ||
* catch in JSON Schema validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be some more rules in here i haven't encoded yet. Either way, this is where we add the custom validation on the raw type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to check for ref integrity here as well. Just make sure that all refs point to a valid target.
src/parser.ts
Outdated
// here we are adding some extra constraints on the value type | ||
// we can relax these later when we can ensure we can cast these properly | ||
this.location.push('items') | ||
if (prop.items.items) { | ||
this.recordError("Arrays are currently not supported as element types of arrays") | ||
} | ||
if (prop.items.additionalProperties) { | ||
this.recordError("Maps are currently not supported as element types of arrays") | ||
} | ||
this.location.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below
} | ||
} | ||
|
||
function detectCircularReference(schema: Schema, visited: Set<string> = new Set()): ValidationError | null { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted this but have not fixed this yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to block circular refs just get clever about how to process it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments about splitting out the concept of field descriptors from types & some fretting about MapType
but overall this is a great improvement! Normalizing the schema down from the authored format feels like cutting with the grain of the wood.
return p?.xtpType?.kind === "boolean" | ||
} | ||
function isMap(p: XtpTyped): boolean { | ||
return p?.xtpType?.kind === "map" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I had something like:
type: object
properties:
foo:
type: string
nullable: false
bar:
type: number
additionalProperties:
type: 'string'
Would that be a map
or an object
? (Does this change for patternProperties
?)
(Edit: I get into semi-structured objects a little later in the review at MapType
!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be caught in the validation step as these two concepts should not be mixable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, currently we don't support that. additionalProperties
and properties
are mutually exclusive
for (const key in node) { | ||
if (Object.prototype.hasOwnProperty.call(node, key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golf: for (const [key, child] of Object.entries(node)) {
should work here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can replace the hasOwnProperty call too? is Object.entries() shallow? Will take a look
src/parser.ts
Outdated
if (prop.additionalProperties) { | ||
this.validateTypedInterface(prop.additionalProperties) | ||
|
||
// here we are adding some extra constraints on the value type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ha – so to confirm my understanding, is it the trickiness of writing recursive casting code that's steering us away from supporting recursive types?
} | ||
} | ||
|
||
export class MapType implements XtpNormalizedType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapType
is raising some hairs – there's this concept of a semi-structured object in JSONSchema, where some properties are defined with specific types and other properties are allowed according to a pattern or a catch-all (additionalProperties
or patternProperties
). In Rust, at least, that's supported using #[serde(flatten)]
– the additional properties can be flattened into a HashMap<String, T>
on the parent object. Golang looks like it supports something similar with map[string]*json.RawMessage
, though it requires some additional elbow grease.
I suppose my worry is that, while we can strictly disallow maps and structs now, if we codify MapType
as distinct from an ObjectType
it'll be hard to introduce that functionality later. (This kind of gets back to representing properties on an object as an array of FieldDescriptor
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map type should be used for typing what are now untyped objects. The distinction b/w an object and a map is dynamic keys, whereas object are for when you know the keys and types ahead of time.
A good example would be http headers i think. Today you can just say that it's an untyped object, but tomorrow you can say that it's a Map<String, String>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah – the trick is that JSONSchema lets you represent both typed and dynamic keys in a single object. HTTP headers could be a useful example of a semi-structured object, too – you might say something like:
ResponseHeaders:
type: object
properties:
etag:
type: string
additionalProperties:
type: string
and in rust you might produce:
#[derive(Deserialize)]
struct ResponseHeaders {
etag: String
#[serde(flatten)]
additional_properties: HashMap<String, String>
}
or in TS:
interface ResponseHeaders {
etag: string
[additionalProperties: string]: string
}
(This not to say we need to enable this now, but if we're looking to track OpenAPI support it'd be useful to leave the door open to adding this down the line. I worry that adding a distinct Map
type makes it harder to add support later, vs. pulling that info into the FieldDescriptor
.)
constructor(doc: V1Schema) { | ||
this.doc = doc as any | ||
this.errors = [] | ||
this.location = ['#'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot – it's an elegant abstraction!
src/types.ts
Outdated
function cons(t: XtpNormalizedType, opts?: XtpTypeOpts): XtpNormalizedType { | ||
// default them to false | ||
t.nullable = (opts?.nullable === undefined) ? false : opts.nullable | ||
t.required = (opts?.required === undefined) ? false : opts.required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as you know!) I suspect this should probably get split into PropertyDescriptor
and NormalizedType
, where things like required
, field-level description
, name
(or namePattern
) live on the PropertyDescriptor
and things like kind
, nullable
, type-level description
and other type info can live on the NormalizedType
. This gives you a distinction between "field-level" properties when doing code generation and "type-level" properties. On the codegen side you'd end up iterating over a list of PropertyDescriptor
objects with type information inside.
(This gets at a pretty big impedance mismatch between JSONSchema and the sort of codegen XTP does – JSONSchema doesn't need to name types, but having a name for any sort of rich type is required in most languages we target. This might be a good place to generate an "inferred name" for any type that's otherwise unnamed by the schema as authored?)
A maybe-more complete sketch:
interface PropertyDescriptor {
value: XtpNormalizedType,
description?: string
codeExamples: string[]
}
interface NamedPropertyDescriptor extends PropertyDescriptor {
name: string
required: boolean
}
interface PatternPropertyDescriptor extends PropertyDescriptor {
pattern: RegExp
}
// pretty much the same as before, sans "required"
interface XtpTypeOpts {
nullable?: boolean;
}
// ditto!
export interface XtpNormalizedType extends XtpTypeOpts {
kind: XtpNormalizedKind;
}
// a little different: note the split between `name` and `inferredName` and the list of `properties`
class ObjectType implements XtpNormalizedType {
kind: XtpNormalizedKind = 'object';
// if the object appears in a "naming" position (e.g., listed in components/schemas)
name?: string;
// Type-level docs and code examples can go here.
description: string;
codeExamples: string;
// if name is not available we infer a name by building up a path from the last named object
inferredName: string;
properties: PropertyDescriptor[]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, the only thing left I think is adding warnings
to the result
super(message); | ||
Object.setPrototypeOf(this, ValidationError.prototype); | ||
} | ||
export class ValidationError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love this!
constructor(doc: V1Schema) { | ||
this.doc = doc as any | ||
this.errors = [] | ||
this.location = ['#'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful!
Going to merge and push since it should be backwards compatible |
* Refactor: see dylibso/xtp-bindgen#18 * Support nullable vs required
* Refactor Bindgen dylibso/xtp-bindgen#18 * get XtpTyped from the project * bump to latest
* Refactor: see dylibso/xtp-bindgen#18 * Support nullable vs required
Refactoring the bindgen with the goal of having a simpler IR-like form for types
that makes recursively generating type signatures and code easier.
The end goal is to remove all the hairy type portions of the XTP Schema
document. e.g.:
and replace with a recursively defined type
XtpNormalizedSchema
(namewill likely change to XtpType). I've kept things backwards compatible by
only adding this type on the property
xtpType
. We should switchbindgens to use this object then we can remove the other properties.
I also decided to change a little bit about the whole process. e.g.:
In order to think more clearly about the whole process, I took up a moment to
write up what I think the whole flow for schemas should be in terms of validating and compiling:
Step 0: JSON Schema
First we validate against JSON schema. This should ensure that we can parse the document into typescript types in the next step. It should not allow any extra values or any values outside of the enum ranges. We should be able to do a simple
const doc = rawDoc as V1Schema
in typescript and the doc object should be valid.Step 1: Parse and Validate
Here we parse the json or yaml into a raw javascript object and cast it to the V1 or V0 schema type. This gives us a raw, but typed representation of the schema. Here we should do extra conditional validation steps that can’t be done (or are too complex to be done) in the JSON Schema. e.g. validating content-type / type pairs, etc.
Step 2: Normalize
Here we take the raw parsed types and “normalize” them into a simpler form. We will walk the document and replace all occurrences of $ref, items, additionalProperties, type, and format with a single recursive XtpType.