Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Representation of objectives changed between 800-53 Rev 4 and 800-53 Rev 5 breaking parsers #194

Open
gregelin opened this issue Feb 26, 2023 · 14 comments
Labels
bug The issue is a bug report.

Comments

@gregelin
Copy link

gregelin commented Feb 26, 2023

Describe the bug

The representation of security control assessment objectives in the OSCAL 800-53 catalogs published by NIST on GitHub changed between Rev 4 and Rev 5 and broke existing code for parsing and generating OSCAL catalogs.

The change was multidimensional and significant enough that the parser and generator need to be extensively re-written to support the new format.

Three dimensions of the objective changed between Rev 4 and Rev 5:

  • The location of the assessment objectives associated a security control in the OSCAL hierarchy
  • The pattern used to express the value of the 'id' of the part that represents the objective
  • The string used as the name in the value of the 'name of the part that represents the objective

The change in the value of name also appears to be a multidimensional change. Instead of a singular renaming of objectives to assessment-objective, multiple terms were introduced for objectives (e.g., assessment-objective and assessment-method) and the term assessment-objective appears effectively overloaded in that sometimes assessment-objective identifies a grouping of objectives without prose and sometimes assessment-objective identifies an actual objective with prose.

"parts": [
{
  "id": "at-3_obj.a",
  "name": "assessment-objective",
  "props": [
    {
      "name": "label",
      "value": "AT-03a.",
      "class": "sp800-53a"
    }
  ],
 "parts": [ "#content clipped" ]                 

This means logic must be written in the application for storing and in the UI to distinguish between the node that should NOT have prose and a node that has prose but happens to be empty; Null is no longer sufficient; It is no longer possible to get a simple list of objectives by searching for names because now there are multiple classes of objectives.

Who is the bug affecting

This problem is currently affecting GRC vendors and other tool makers seeking to read/write OSCAL catalogs.

What is affected by this bug

CI/CD, OSCAL Content, Documentation, Modeling, Tooling & API, Website

How do we replicate this issue

Download 800-53 Rev 4 and 800-53 Rev 5 catalog:

Examine the representation of objectives for AT-3 in 800-53 Rev 4 and Rev 5...

800-53 Rev 4 Objective representation

In 800-53 Rev 4, objectives are represented as parts on statements in the following form:


{ "id": "at-3.a_obj",
  "name": "objective",
  "props": [
    {
      "name": "label",
      "value": "AT-3(a)"
    }
  ],
  "prose": "provides role-based security training to personnel with assigned security roles and responsibilities before authorizing access to the information system or performing assigned duties;"
}

The id pattern is: {control path identifier <control_id>.<control_part>}_obj
The name pattern is: objective

image

800-53 Rev 5 Objective representation

In 800-53 Rev 5, both the pattern of the objective identifier and the name of the part changed
In 800-53 Rev 5, objectives are represented as parts on statements in the following form:

{
  "id": "at-3_obj.a.1-1",
  "name": "assessment-objective",
  "props": [
    {
      "name": "label",
      "value": "AT-03a.01[01]",
      "class": "sp800-53a"
    }
  ],
  "prose": "role-based security training is provided to {{ insert: param, at-03_odp.01 }} before authorizing access to the system, information, or performing assigned duties;"
}

The id pattern is: <control_id>_obj.<control_part>
The name pattern is: assessment-objective

image

image

Expected behavior (i.e. solution)

The desired behavior is that basic parsing script for an OSCAL Catalog and OSCAL release (e.g., 1.0.3) will correctly parse all OSCAL Catalogs. I say desired behavior because OSCAL is still under development and different catalogs may differ significantly in their representation of various catalog concepts defined OSCAL.

The expected behavior is that a basic parsing for an OSCAL Catalog and OSCAL release will correctly parse all catalogs produced from the same source across all versions of the Catalog with minor modifications.

It was not surprising that changes would exist between Rev 4 and Rev 5 in the same release of OSCAL. It was surprising to find so much change within the representation of a single type of content.

  • We expected the objective "id" value pattern to be consistent between Rev 4 and Rev 5 as there seems little reason change the identifier pattern;
  • We expected the name of the part (e.g., "objective") to remain consistent between Rev 4 and Rev 5;
  • We expected the location of the objectives in the hierarchy to be the same (though we were less surprised that this moved)

Our team expected at most only one meaningful parser-detectable attributes to change between versions. We did not expected all meaningful parser-detectable attributes -- identifier and part name and location -- to change simultaneously.

After noticing changing in id format and name, we expected just a different name of the additional of multiple

Other comments

This issue focuses on the changing representation of objectives. But we discovered this problem after our parsers first broke the multidimensional changes to the organizational defined parameters between Rev 4 and Rev 5. That made two unexpected changes that are (1) breaking changes in that they broke our working code, (2) requiring extensive human intervention to correctly resolve

This means multiple representations of content that we reasonably expected to be standardized are in fact changing even when issued from the same content provider.

We are discovering similar multidimensional differences between NIST OSCAL content and FedRAMP OSCAL content.

@gregelin gregelin added the bug The issue is a bug report. label Feb 26, 2023
@aj-stein-nist
Copy link
Contributor

Thanks for your report. There is a lot of good detail in here to consider, but it will take some time to analyze and bring into sprint, in that order. I am tentatively adding this for Sprint 65 (not this sprint, but the following one, for the second half of March; we will start moving to a bi-weekly sprint soon, heads up and expect more communication on this soon enough).

@gregelin
Copy link
Author

gregelin commented Mar 1, 2023

@aj-stein-nist Glad the detail was useful. It makes sense to spend some time analyzing this issue. Multidimensional changes in produced data significantly raises the required sophistication of the parsers and/or raises the cost of parsing. And I worry that limits the parties willing to write parsers and slows adoption.

@aj-stein-nist
Copy link
Contributor

aj-stein-nist commented Mar 1, 2023

@aj-stein-nist Glad the detail was useful. It makes sense to spend some time analyzing this issue. Multidimensional changes in produced data significantly raises the required sophistication of the parsers and/or raises the cost of parsing. And I worry that limits the parties willing to write parsers and slows adoption.

Will you be able to discuss the design of your parser given the upcoming conversation of this work?

Additionally, and separate of this work item, we had discussed the possibility of pairing and looking together at the NIST SP 800-53 Revision 4 and 53/53A Revision 5 catalogs to address some of your concerns around a different set of concerns (not in this issue), but similarly related. Can we discuss that via Gitter and come up with a game plan before this work? It seems important we understand some of your challenges, and that is going to require some deeper higher-bandwidth conversations while looking at the models. Let me know, thanks.

@GaryGapinski
Copy link

I suspect NIST IR 8011 is related.

@GaryGapinski
Copy link

I suspect NIST IR 8011 is related.

If one

  • is interested in security assessments (an atypical interest enjoyed only by the cognoscenti)
  • is fascinated with the prospect of automating such (à la James Watt)
  • finds OSCAL sufficiently interesting to consider using it as a sustaining component of the grand scheme
  • has witnessed https://github.com/usnistgov/oscal-content and thought it might be a handy fuel for the engine
  • is keenly focused on assessment objective achievement as the raison d'être for assessment bliss

then one might regard the authors of NISTIR 8011 as kindred souls.

@gregelin
Copy link
Author

gregelin commented Mar 10, 2023

@aj-stein-nist Thanks for adding this issue to sprint 65. I can discuss aspects of our OSCAL parser; and since I've now seen and/or written multiple parsers for OSCAL and Open-Control I think I can share some thoughts on parsing practices and strategies and how well each handles multi-dimensional changes.

Let's consider a BasicParser for OSCAL Catalogs...

BasicParser is built following agile principles: the "simplest solution that will work" to create an MVP and improvement through iteration. At the time of BasicParser MVP and its few iterations are being developed, pretty much a single sample catalog, NIST 800-53 Rev 4, is available in OSCAL to develop against and NIST, at the time BasicParser is written, is not yet publishing multiple example catalogs of multiple frameworks such as GDPR, CMMC, PCI, ISO 27001 to run the parser against.

BasicParser is written by Chris (a persona). Chris is 90% likely to be either a Compliance SME who can code, or a developer who having done a couple of ATOs never wants to write an SSP again. Chris has moderate to pretty good coding skills, works in the web application space, and has crawled and/or parsed a variety CSV files, serialized content (JSON, YAML, XML) and semi-structured content using regex and parsing libraries. There's a 10% chance that Chris has a CompSci PhD and codes in C; and a less 1% chance that Chris routinely writes interpreters or XSL processors. If BasicParser is written by the rare Chris with a CompSci PhD, there's a 99.9% chance that Chris knows little to nothing about ATOs, Security Controls, and the 800-53 and is working with a Compliance SME.

Embracing agile, Chris gets the sample data set of 800-53 Rev 4 catalog in OSCAL, and searches for a package someone else has written to parse OSCAL to see if its done, and not finding any (at the time), looks for a standard library to consume the JSON (or YAML or XML) or tries some regex or simple XLST. The goal: the simplest solution that can work for an MVP.

The really simple parsing strategy Chris first tries is based on regex alone or a JSON or XML reader plus regex shows promise to do things like pull out the controls. The controls after all are the meat of the content. Then Chris tries to reconstitute the text strings in the Word version of 800-53 Rev 4 and notices the recursion of the control prose. The text is not only split up, its recursive. And there seems other things are hanging off that recursion, too. This is the first complication that necessitates changing the parsing strategy of BasicParser, even to get to MVP.

Chris digs in deeper, going back and forth between the NIST OSCAL documentation still under development and the reference catalog of 800-53 Rev 4. How consistent is the structure and the recursion? Some patterns in the recursion begin to make sense. Through a mix of nested if-then statements and a one or three recursive functions, Chris has made BasicParser MVP!

BasicParser MVP doesn't do much with the UUIDs or the props because they don't seem to have much impact on the extraction of catalog's controls and parameters. During iterations, BasicParser gets better at handling props to help sort controls. (It won't be until later, when Chris is enhancing BasicParser to parse an SSP, that UUIDs and props reveal themselves in all their glory as the second and third complications, that Chris begins rethinking life choices.)

As Chris iterates BasicParser, improvements are made. The schema starts to be used to validate content as Chris starts generating a few catalogs in OSCAL. Chris's generated hierarchy and recursion follows the one known example.

BasicParser, built in an agile and iterative fashion on top of a tiny sample set, encounters no exceptions to a variety of assumptions about the structure of an OSCAL catalog that seem perfectly reasonable based on both the sample data and the official documentation. For example in Rev 4, all objects are just a type of part. Every object part has a prose key, and the suffix of the object id is consistent with a simple hierarchy. BasicParser can recurse through the parts and easily identify that a part is an object via a regex math on the part.id or part.name.

Chris's colleagues are impressed! BasicParser can extract controls and parameters, and objectives and links and metadata from an OSCAL catalog. No more custom, fragile regex used to separate compound text strings inside of spreadsheet cells! No more changing the parser for every organization or vendor spreadsheet! This serialized, standardized OSCAL catalog is clearly better. Once other catalogs are expressed in OSCAL, it will be possible to consume the information with BasicParser!

But alas, BasicParser is making assumptions that there are patterns to identifiers, assumptions that the recursion is consistent, and assumptions that nodes are always located in the same place in the hierarchy. BasicParser assumes all swans are white because Chris has only really seen one swan...

@gregelin
Copy link
Author

@GaryGapinski You've made me a fan of NIST IR 8011! Thanks!

@aj-stein-nist
Copy link
Contributor

Let's consider a BasicParser for OSCAL Catalogs...

OK sounds good.

BasicParser is written by Chris (a persona). Chris is 90% likely to be either a Compliance SME who can code, or a developer who having done a couple of ATOs never wants to write an SSP again. Chris has moderate to pretty good coding skills, works in the web application space, and has crawled and/or parsed a variety CSV files, serialized content (JSON, YAML, XML) and semi-structured content using regex and parsing libraries. There's a 10% chance that Chris has a CompSci PhD and codes in C; and a less 1% chance that Chris routinely writes interpreters or XSL processors. If BasicParser is written by the rare Chris with a CompSci PhD, there's a 99.9% chance that Chris knows little to nothing about ATOs, Security Controls, and the 800-53 and is working with a Compliance SME.

Thanks for this level-setting, it helps set a good frame of mind for the rest (I have read it once quickly, once slowly by now).

Embracing agile, Chris gets the sample data set of 800-53 Rev 4 catalog in OSCAL, and searches for a package someone else has written to parse OSCAL to see if its done, and not finding any (at the time), looks for a standard library to consume the JSON (or YAML or XML) or tries some regex or simple XLST. The goal: the simplest solution that can work for an MVP.

Also good context. To be clear, this means start from scratch, and only processing the resulting OSCAL JSON (or YAML or XML, but primarily JSON form) of OSCAL and nothing else, correct?

The really simple parsing strategy Chris first tries is based on regex alone or a JSON or XML reader plus regex shows promise to do things like pull out the controls. The controls after all are the meat of the content. Then Chris tries to reconstitute the text strings in the Word version of 800-53 Rev 4 and notices the recursion of the control prose. The text is not only split up, its recursive. And there seems other things are hanging off that recursion, too. This is the first complication that necessitates changing the parsing strategy of BasicParser, even to get to MVP.

Can you further explain "recursion of the control prose" with a little more detail to make sure we best understand the issue here?

Chris digs in deeper, going back and forth between the NIST OSCAL documentation still under development and the reference catalog of 800-53 Rev 4. How consistent is the structure and the recursion? Some patterns in the recursion begin to make sense. Through a mix of nested if-then statements and a one or three recursive functions, Chris has made BasicParser MVP!

I guess this is good news, but is the implication the structure and structure is not consistent in some parts, but is in others? I think we would benefit from some more detail here and there.

BasicParser MVP doesn't do much with the UUIDs or the props because they don't seem to have much impact on the extraction of catalog's controls and parameters. During iterations, BasicParser gets better at handling props to help sort controls. (It won't be until later, when Chris is enhancing BasicParser to parse an SSP, that UUIDs and props reveal themselves in all their glory as the second and third complications, that Chris begins rethinking life choices.)

OK this is great, thank you for this example of detail, this is the kind of thing I want to focus on with a more detailed developer pairing later, if you do not mind.

As Chris iterates BasicParser, improvements are made. The schema starts to be used to validate content as Chris starts generating a few catalogs in OSCAL. Chris's generated hierarchy and recursion follows the one known example.

Excellent progression!

BasicParser, built in an agile and iterative fashion on top of a tiny sample set, encounters no exceptions to a variety of assumptions about the structure of an OSCAL catalog that seem perfectly reasonable based on both the sample data and the official documentation. For example in Rev 4, all objects are just a type of part. Every object part has a prose key, and the suffix of the object id is consistent with a simple hierarchy. BasicParser can recurse through the parts and easily identify that a part is an object via a regex math on the part.id or part.name.

Thanks, this is good lead-in to the kind of detail I was looking for.

Chris's colleagues are impressed! BasicParser can extract controls and parameters, and objectives and links and metadata from an OSCAL catalog. No more custom, fragile regex used to separate compound text strings inside of spreadsheet cells! No more changing the parser for every organization or vendor spreadsheet! This serialized, standardized OSCAL catalog is clearly better. Once other catalogs are expressed in OSCAL, it will be possible to consume the information with BasicParser!

But alas, BasicParser is making assumptions that there are patterns to identifiers, assumptions that the recursion is consistent, and assumptions that nodes are always located in the same place in the hierarchy. BasicParser assumes all swans are white because Chris has only really seen one swan...

OK, so this is a wonderful start, but when and how can we talk about specific consistences in structure and recursion, or lack thereof, for this notional parser? I asked some questions about those details, as opposed to comments, in between topics of interest above. We would appreciate if we can understand specific issues with this notional parser approach (if not an actual parser), because we need to figure out: 1) what are the key differences between Revision 4 and Revision 5 and 2) if they are significant beyond additional props (my assumption from prior analysis) how do they break the parser and cause exceptions/error behavior to incompletely parse any (not some) of a catalog, or until I get specific explanation, just make parsing more complex and means a parse continues but key information is missing because some of these relationships have changed in some significant way?

Does that make sense? If we prioritize this for this upcoming sprint starting on Thursday, we will still need some key questions answered in the first few days, or I will need to push the work on this until we are on firmer ground. I hope that makes sense. (We can keep it agile for both parties.)

@gregelin
Copy link
Author

gregelin commented Mar 15, 2023 via email

@aj-stein-nist
Copy link
Contributor

This is very good stuff, given the level of detail in the discussion and the scope of changes needed, I would like to push back work on this (or where it starts, I doubt as-is there is a simple bite-size changed that can be relatively complete in two weeks) for Sprint 66. This is very good detail to start, but I would like to understand more and we refine what is reported and break it out into: key concerns, what can't be changed (and why), what can be changed (and why). I hope that makes sense.

Necessary pre-work needs to occur. I do not want to avoid the work, but I do not want us to rush with pre-solutioning either.

@gregelin
Copy link
Author

gregelin commented Mar 16, 2023 via email

@GaryGapinski
Copy link

Regarding

  • Should there be a single parameter pattern across all catalogs?
  • Should there be a single parameter pattern in one catalog?
  • Can an SSP use a different parameter pattern and a catalog?
  • Can two organizations generate versions of the same framework using different parameter id patterns?
  • If parameter id pattern can change between revisions, how is the change communicated in a way that is automatically usable?
  • What if different assessment authorities want different styles of human-readable versions of the parameters outputted in artifacts?

I observe

  • An unstated and unexplicated semantic deficit in OSCAL.
  • identifier/pattern/prop bondage and discipline seems inadequate to obviate the deficit, if only because such attributes in and of themselves lack semantic import other than simple identification. They aren't even reliably unique amongst venues.
  • Expect no help whatsoever regarding identifier changes having any correlation between versions. We've seen this already in usnistgov/oscal-content. We'll likely see it again.
  • Expecting harmony amongst organizations will result in disappointment.
  • Assessment "authorities" (a rather vague term) will do what they wish without caring a fig about other authority's labeling (not the same as identification) credos. This should ensure everlasting discord. I play an assessment authority in local community theatre.

@aj-stein-nist
Copy link
Contributor

This is just an administrative change for those who will receive the notification, but I will be moving this to the oscal-content repository since it is implies some feedback about the models, but really their application in our published versions of the NIST SP 800-53 Rev 4 and Revision catalogs, and the 800-53 Revision 5 objectives.

@brian-comply0
Copy link

brian-comply0 commented Jul 5, 2023

There is a ton of information in this issue. Candidly, I haven't read all of it in detail, so apologies if this is duplicative.
When addressing this issue, please consider separate parts for each object.

Having them all in a single prose field runs counter to the machine-readable goal of OSCAL.
I'd suggest something like this:

Instead of:

         <part id="ac-1_asm-examine" name="assessment-method">
            <prop name="method" ns="http://csrc.nist.gov/ns/rmf" value="EXAMINE"/>
            <prop name="label" class="sp800-53a" value="AC-01-Examine"/>
            <part name="assessment-objects">
               <p>Access control policy and procedures</p>
               <p>system security plan</p>
               <p>privacy plan</p>
               <p>other relevant documents or records</p>
            </part>
         </part>

please consider something like:

         <part id="ac-1_asm-examine" name="assessment-method">
            <prop name="method" ns="http://csrc.nist.gov/ns/rmf" value="EXAMINE"/>
            <prop name="label" class="sp800-53a" value="AC-01-Examine"/>
            <part id="ac-1_asm-examine-1" name="assessment-object"> <!-- NOTE singular "object", not plural "objects" -->
               <p>Access control policy and procedures</p>
            </part>
            <part id="ac-1_asm-examine-2" name="assessment-object">
               <p>system security plan</p>
            </part>
            <part id="ac-1_asm-examine-3" name="assessment-object">
               <p>privacy plan</p>
            </part>
            <part id="ac-1_asm-examine-4" name="assessment-object">
               <p>other relevant documents or records</p>
            </part>
         </part>

Apparently I made a similar recommendation three years ago.
See the comment titled "IDs on Assessment Actions and Objects" here: #67

@aj-stein-nist aj-stein-nist moved this from Todo to Further Analysis Needed in NIST OSCAL Work Board Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug report.
Projects
Status: Further Analysis Needed
Development

No branches or pull requests

4 participants