Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog/Group from Metaschema is interpreted differently in XSD and JSON-Schema #1989

Open
JustKuzya opened this issue Feb 29, 2024 · 11 comments
Assignees
Labels

Comments

@JustKuzya
Copy link
Contributor

JustKuzya commented Feb 29, 2024

Describe the bug

Metachema definition

Metaschema defines the top-level catalog/group (recursing by refs) as follows (note; it is a choice without attributes):

<choice>
      <assembly ref="group" max-occurs="unbounded">
            <group-as name="groups" in-json="ARRAY"/>
      </assembly>
      <assembly ref="control" max-occurs="unbounded">
            <group-as name="controls" in-json="ARRAY"/>
      </assembly>
</choice>

I assume that the defaults in Metaschema match XSD ones and fall back to min/max-Occurs defaulting to 1

XSD schema definition

Ends up as follows:

 <xs:choice>
    <xs:element name="group"
                 type="oscal-catalog-group-ASSEMBLY"
                 minOccurs="0"
                 maxOccurs="unbounded"/>
    <xs:element name="control"
                 type="oscal-catalog-control-ASSEMBLY"
                 minOccurs="0"
                 maxOccurs="unbounded"/>
 </xs:choice>

which considering mixOccurs and maxOccurs defaulting to 1 => we have effectively an Exclusive OR

JSON Schema on the other hand

Has a plain no-choice-restricted definition (excerpt is larger than the two above to illustrate the context):

    "params" : 
     { "type" : "array",
      "minItems" : 1,
      "items" : 
      { "$ref" : "#assembly_oscal-control-common_parameter" } },
     "controls" : 
     { "type" : "array",
      "minItems" : 1,
      "items" : 
      { "$ref" : "#assembly_oscal-catalog_control" } },
     "groups" : 
     { "type" : "array",
      "minItems" : 1,
      "items" : 
      { "$ref" : "#assembly_oscal-catalog_group" } },
     "back-matter" : 
     { "$ref" : "#assembly_oscal-metadata_back-matter" } },
    "required" : 
    [ "uuid",
     "metadata" ],

while the XSD equivalent, probably should be:

    "params" : 
     { "type" : "array",
      "minItems" : 1,
      "items" : 
      { "$ref" : "#assembly_oscal-control-common_parameter" } },
 //========= CHANGE-START =========
"oneOf":[
            "controls" : 
                {   "type" : "array",
                    "minItems" : 1,
                    "items" :     { "$ref" : "#assembly_oscal-catalog_control" } 
                },
            "groups" : 
                {   "type" : "array",
                    "minItems" : 1,
                    "items" : { "$ref" : "#assembly_oscal-catalog_group" } 
                },
      ],
//=========== CHANGE-END ===========
     "back-matter" : 
     { "$ref" : "#assembly_oscal-metadata_back-matter" } },
    "required" : 
    [ "uuid",
     "metadata" ],

Who is the bug affecting

People who try to validate "inventive" and "non-classical" JSON catalogs

When groups and controls mixed together in "twigs and leaves in same bag"-style, which is allowed by OSCAL JSON Schema, but isn't allowed by Metaschema, XSD, and OSCAL-CLI tool, the catalog-authors get confused.

What is affected by this bug

Modeling

How do we replicate this issue

  • Create JSON catalog with top-level group that contains a more than 0 groups and more than 0 controls of the depth 2
  • Validate the catalog with JSON Schema (player's choice: AJV, Oxygen, etc...)
  • Validate the catalog with CLI tool
  • Notice the difference

Expected behavior (i.e. solution)

Difference should be not be noticeable

Other comments

No response

Revisions

No response

@JustKuzya
Copy link
Contributor Author

Wendel, @wendellpiez would you be able to take a look at the issue with me, or the guidance in the description is enough?

@aj-stein-nist
Copy link
Contributor

This issue has been reported to the Metaschema XSLT processor with bug usnistgov/metaschema-xslt#105. That will need to be fixed there to resolve this issue in Metaschema.

@wendellpiez
Copy link
Contributor

@JustKuzya, this looks good so far. The next step would be to flesh out the example in the form of unit tests to replicate the current and correct behaviors.

Next step in prep would be to contrive a test Metaschema containing (isolating) a choice group. Maybe AJ's example in usnistgov/metaschema-xslt#105 could be worked up.

From there we can step forward to writing XSpec tests. Once we know we can validate a change to be "the correction" (while not breaking other things etc.), the actual fix will likely be easy.

Meanwhile, in addition to a logical mapping into JSON syntax such as you provide here, small examples that are "currently-wrong" are also needed (in the sense that the validator is misreporting them). These could be with the example (toy), with an OSCAL example, or both.

The long term: this work dovetails with the schema validation-validation work I am doing in the oscal-xslt repository. Up for discussion is how OSCAL can support external tools developers with robust and solid examples for unit testing and regression testing. (I have ideas.)

Subject to discussion: who is interested in learning to do this, or at least in seeing how it is done.

@wendellpiez
Copy link
Contributor

Here is a starting point, an XSpec written by @aj-stein-nist last year to validate an adjustment in how fields are defined, that have values but not flags (i.e. an edge case), in the JSON Schema: https://github.com/usnistgov/metaschema-xslt/blob/develop/src/testing/issue_235_regression.xspec

Note that targets are expressed in XML according to this rendering: https://www.w3.org/TR/xpath-functions-31/#func-json-to-xml (scroll down to see examples)

This example shows mappings at the level of structures internal to documents, but the tests can be made to work from the top level as well, i.e. examining an entire mapping not just a particular bit of it.

@wendellpiez
Copy link
Contributor

For field testing the JSON Schema, we need examples of the following, in JSON:

  • catalog with controls - valid
  • catalog with groups - valid
  • catalog with both - valid
  • catalog with neither - valid
  • group with controls - valid
  • group with groups - valid
  • group with both - invalid
  • group with neither - valid

And we need a validation runtime that uses the JSON Schema to validate and examines its results to confirm they are correct to this lineup.

My understanding is the current JSON Schema incorrectly reports 'group with both' as valid.

@iMichaela
Copy link
Contributor

For field testing the JSON Schema, we need examples of the following, in JSON:

  • catalog with controls - valid
  • catalog with groups - valid
  • catalog with both - valid
  • catalog with neither - valid
  • group with controls - valid
  • group with groups - valid
  • group with both - invalid
  • group with neither - valid

And we need a validation runtime that uses the JSON Schema to validate and examines its results to confirm they are correct to this lineup.

My understanding is the current JSON Schema incorrectly reports 'group with both' as valid.

Indeed the cases listed above are accurate per current definitions and XML Schema which properly represents choice. As we move forward after we fix the pipeline, is going to be driven by OSCAL JSON users:

  1. If no JSON users generated OSCAL content that will become invalid if we break the JSON schema backwards compatibility, then WE MIGHT decide to fix it in the JSON schema, OR
  2. we remove choice from XML schema because this will not cause a backward compatibility issue.

This is not a light decision to make and the community input is important. I am preparing the necessary information for the community (coming soon). So, TODAY, per current schema, all cases above are valid per JSON Schema. and are as listed per XML Schema. They need to be sinked but how, it will be based on the community's feedback.

@iMichaela
Copy link
Contributor

@JustKuzya aggregated all cases where choice is used:

# XSD xs:choice occurences
1 oscal_assessment-plan_schema.xsd 18
2 oscal_assessment-results_schema.xsd 18
3 oscal_catalog_schema.xsd 15
4 oscal_complete_schema.xsd 24
5 oscal_component_schema.xsd 14
6 oscal_poam_schema.xsd 18
7 oscal_profile_schema.xsd 19
8 oscal_ssp_schema.xsd 14

The choice is used in multiple places besides group/control some are clear (no attributes, which makes them min-max = 1-1), some are less clear in what the intent was, especially when there is a choice defined but it has only one element:

address/location-uuid, 
parameter-Field/parameter-Assembly, 
include-all/select-by-id(-Assembly) profile
flat/as-is/custom.
group/insert-controls
value[oscal-control-common-parameter-value-FIELD]/[]select[oscal-control-common-parameter-selection-ASSEMBLY]
include-all/include-controls
on-date/with-date-range/at-frequency
include-all/include-control
include-all/include-objective
include-all/include-subject
?? choice with only one element ref->blockElementGroup (only one)
headingzblockElementGroup -> h1, h2,…h6
blockElementGroup -> heading/list/block/p/table/img
blockTextGroup -> pre/hr/blockquotetype
listGroup -> ul/ol
[0-Infinity] inlineMarkup/list/blockText/headingBlockElement(-Group)/p
[1-Infinity]  tablerowType -> td/th 
?? blockQuoteType -> [0-Infinity] blockElementgroup (only one)
?? inlineMarkupType -> [0-Infinity] inlineMarkupGroup (only one)
inlineMarkupGroup -> a/insert/br/phraseMarkupGroup
code/em/i/b/strong/sub/sup/q/img
anchortype -> [0-Infinity] phraseMarkupGroup 
  1. There is a need to clean the choices that have a single element.
  2. More research is necessary to determine the impact to:
    • GRC vendors that will need to update the tools if XML and JSON schemas are brought in sink
    • OSCAL content authors that might have generated JSON content that has not choice constraints.

@wendellpiez
Copy link
Contributor

Cool - except xsd:choice in the XSD is not the same as choice in the Metaschema sources. It is not as bad as this. (Many xsd:choice are there for other reasons. Including those that have just a single element, as this is machine-generated code.)

A couple of things to add:

  • Some due diligence on oscal-cli support for choice in JSON is called for - I will do this if no one does it first
  • I will also continue prototyping a solution that correctly 'blows out' the JSON Schema (it doesn't hurt to look)

Finally: additionally to the two suggestions made so far, there is a third 'in-between' technical solution, namely reflecting the rule (group XOR control inside group) not as a modeling restriction (with choice) but with a constraint, which reports their appearing together, maybe as a WARNING.

(Indeed this is what we might have had in mind when we pondered this problem in forgotten-times.)

@iMichaela
Copy link
Contributor

Thank you @wendellpiez .

A.

Many xsd:choice are there for other reasons. Including those that have just a single element, as this is machine-generated code.

Where do we have those reasons documented so all current and future members of the team, and our collaborators have a clear understanding?

B.

A couple of things to add:

  • Some due diligence on oscal-cli support for choice in JSON is called for - I will do this if no one does it first
  • I will also continue prototyping a solution that correctly 'blows out' the JSON Schema (it doesn't hurt to look)

Both are needed, and I greatly appreciate your initiative, but the expertise and understanding of the choice usage and interpretation need more transparency per A. concern above. Since we need to present the problem to our community, and get their perspective in terms of the direction to go, I would like to work with you on documenting the types of choice use cases and where are they used (expending on what @JustKuzya started above.

Finally: additionally to the two suggestions made so far, there is a third 'in-between' technical solution, namely reflecting the rule (group XOR control inside group) not as a modeling restriction (with choice) but with a constraint, which reports their appearing together, maybe as a WARNING.

I like it, and we need to present it next to the others more 'extreme'. An alignment is needed. Thank you again.

@wendellpiez
Copy link
Contributor

A. requires a canonical specification of a Metaschema -> XSD mapping or at least requirements for that. The Metaschema developers have discussed this but there are also good reasons against it.

What is actually required (and does not require explaining both XSD and Metaschema to all OSCAL users) would be a set of field tests that exhaustively demonstrate expected behaviors by validators, plus ideally a harness for using it.

Indeed this is essential if OSCAL plans to own its own validations, so it can both adjudicate among OSCAL implementations (which ones are 'correct') and some day migrate to tomorrow's schema technology.

I have built one of these for the XSD and am proceeding to extend it to include JSON Schema, albeit not yet automate it - but see usnistgov/metaschema-xslt#110

Accordingly, I think we need three things at present (in addition to work on usnistgov/metaschema-xslt#105):

  • An analysis of Metaschema choice in OSCAL metaschema sources - not xsd:choice in XSD - to help determine impacts of various options
  • The document-level validation testing framework just discussed - specifically for OSCAL not just "a metaschema example" as in Metaschema repositories
  • A clearer picture of what all the tools do here including oscal-cli

I'm proceeding with these, albeit not in this order - feel free to pitch in or ask how to do so.

@iMichaela
Copy link
Contributor

iMichaela commented Mar 30, 2024

This bug is being fixed upstream (see usnistgov/metaschema-xslt#105 and usnistgov/metaschema-xslt#108 there a WIP PR exists towards supporting choice). Once metaschema-xslt/#105 is fixed and the JSON catalog schema gets updated, the existing OSCAL JSON catalogs become invalid IF the authores used groups and controls as opposed to groups or controls. Based on all conversations had, the suggested way forward until OSCAL 2.0 is released where can be reimplemented, is for now to relax the in the catalog schemas and provide backwards compatibility in this way for all OSCAL catalog instances. PR #2002 is addressing it.
IS THIS THE DIRECTION THE COMMUNITY WOULD LIEK US TO MOVE FORWARD OR CAN WE FIX THE BUG IN THE NEXT RELEASE CANDIDATE v1.2.0 AND THE COMMUNITY WILL ASSESS AND PROVIDE FEEDBACK BEFORE THE FINAL RELEASE? I am calling on the OSCAL JSON users I know: @vmangat @brian-ruf @Telos-sa @bradh @ancatri @degenaro @openprivacy @gregelin @butler54 ... Rob, Travis, Allan, Tom N., Kenny, others ? Please see: #2002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Needs Triage
Development

No branches or pull requests

5 participants