Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autogenerate JSON Schema #28

Merged
merged 4 commits into from
Mar 25, 2024
Merged

Conversation

alexiswl
Copy link
Contributor

@alexiswl alexiswl commented Feb 19, 2024

From #27

Summary

  1. Run typescript-json-schema over the Workflow schema.
  2. Refine the schema (see .github/scripts/curate_autogenerated_schema.py)

Schema refining process steps

  1. Remove loadingOptions and extensionFields properties

  2. Rename class_, type_ etc to class, type

  3. Add $import and $include options and update all requirements such that an item in a requirement can use $include or $import if that property accepts a list.

  4. Fix inputs / outputs / steps, so not just an array but can be an object where the key is the id attribute.

  5. Fix Requirements / Hints, so not just an array but can be an object where the key is the class attribute.

  6. Fix descriptions by removing Auto-generated... from the start

  7. Generate the CWLFile definition, which can be one of Workflow, CommandLineTool, or ExpressionTool and also fits the CWLMetadata

  8. Fix issues with additionalProperties and nested definitions as shown here https://stoic-agnesi-d0ac4a.netlify.app/1 Needed for the CWLFIle and CWLGraphFile attributes

  9. Generate the CWLGraph definition, which is an array of CWLFile objects with CWLMetadata

  10. Write out the schema as JSON

  11. To convert the schema to yaml, use the yq binary to generate the schema as yaml. ruamel.yaml doing some weird things on values under patternProperties regexes

alexiswl and others added 2 commits February 12, 2024 18:22
Also added the GitHub actions template to autogenerate this file
Copy link

codecov bot commented Feb 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.62%. Comparing base (0c14cfc) to head (538c987).

❗ Current head 538c987 differs from pull request most recent head 19cd344. Consider uploading reports for the commit 19cd344 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #28   +/-   ##
=======================================
  Coverage   55.62%   55.62%           
=======================================
  Files         127      127           
  Lines        7837     7837           
  Branches     1572     1572           
=======================================
  Hits         4359     4359           
  Misses       3111     3111           
  Partials      367      367           
Flag Coverage Δ
unittests 55.62% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexiswl alexiswl requested a review from mr-c February 19, 2024 05:43
@alexiswl alexiswl self-assigned this Feb 19, 2024
@alexiswl alexiswl added the enhancement New feature or request label Feb 19, 2024
@alexiswl
Copy link
Contributor Author

Moved back to graph and standard all-in-one schema, now auto-generates a yaml file as well, both files under json_schema

CWL12_REPO_PARENT_TMPDIR_OBJ = TemporaryDirectory()
CWL12_REPO_DIR = Path(CWL12_REPO_PARENT_TMPDIR_OBJ.name) / "cwl-v1.2"

IGNORED_FILES = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking against the conformance tests! I feel like there are more files to ignore.

Look for the json_schema_invalid tag https://github.com/search?q=repo%3Acommon-workflow-language%2Fcwl-v1.2%20json_schema_invalid&type=code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these 'json_schema_invalid' related to the inputs of the workflow or the workflow itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one can be resolved

def fix_secondary_file_schema(schema_dict: Dict) -> Dict:
"""
SecondaryFileSchema should allow for a single string entry to represent the secondary file pattern
Although not recommended, it is allowed in the CWL spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not recommended?

Copy link
Contributor Author

@alexiswl alexiswl Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the requirement attribute ambiguous (and dependent on the implementation), success of the tool then depends on the workflow engine running the code - See common-workflow-language/cwltool#1322

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this okay to keep in or should I remove the commentary?


def fix_inline_javascript_requirement(schema_dict: Dict) -> Dict:
"""
Fix the InlineJavascriptRequirement.expressionLib array to allow for $include
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be more universal, given all the places a $include can occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are all the places an include can occur?

Can an $include occur in any requirement?

Copy link
Member

@mr-c mr-c Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an $include occur in any requirement?

Yes, and every other place: $include can be a key name at any level in a CWL document

https://www.commonwl.org/v1.2/Workflow.html#Document_preprocessing
https://www.commonwl.org/v1.2/SchemaSalad.html#Include

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also apply for '$import'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I'll just add them to the patternProperties of each of the definitions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see 19705a2 / b31d8ba

Comment on lines 1587 to 1499
"patternProperties": {
"^s:.*$": {
"type": "object"
},
# Or the full version
"^https://schema.org/.*$": {
"type": "object"
},
# dct terms should also be permitted
# see https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/metadata.cwl
"^dct:.*$": {
"type": "object"
},
"^http://purl.org/dc/terms/.*$": {
"type": "object"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is valid CWL to have additional prefixed/namespaced elements at any level, of any namespace; not just schema.org, DCT, etc..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so could we use the regex "^\w+:.*$" for shorthand notation + "^\w+:\/\/.* for urls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reluctant to just put in a "^.*$" for pattern properties here, it means spelling mistakes or syntax errors aren't necessarily picked up by the schema. CWLVersion, cwlVersion, cwl_version would all be valid entries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that accepting any string anywhere is bad for validation. This is where having a full CWL language server like Benten can go beyond what JSON Schema can do. If you can allow keys that have a colon (:) in them, then that should suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 554c0f2 / 9867428

@alexiswl
Copy link
Contributor Author

@mr-c please re-review when you have the chance, would be great to get this into the JSON schema store. Even if we just have this file generated into the cwl-ts-auto repository, we can generate a PR into the [JSON Schema store repository] (https://github.com/SchemaStore/schemastore) (like this one - https://github.com/SchemaStore/schemastore/pull/3645/files).

If we want to then move this file into the cwl-v1.2 repository, we can then just generate a new PR that points to a new url

@mr-c
Copy link
Member

mr-c commented Mar 21, 2024

Did you want to rewrite the commits, or shall I squash into one?

@alexiswl
Copy link
Contributor Author

I can squash into one and force push if you'd like?

Uses TypeScript to generate generate JSON Schema,

We then manually fix up with Python
@alexiswl alexiswl force-pushed the enhancement/auto-generate-json-schema-from-ts branch from 9867428 to 538c987 Compare March 25, 2024 07:05
@alexiswl
Copy link
Contributor Author

Done

@mr-c mr-c merged commit f2fc4e5 into main Mar 25, 2024
@mr-c mr-c deleted the enhancement/auto-generate-json-schema-from-ts branch March 25, 2024 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants