Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML map/dictionary ordering MUST be preserved #62

Open
3 tasks
mr-c opened this issue Dec 2, 2024 · 6 comments
Open
3 tasks

YAML map/dictionary ordering MUST be preserved #62

mr-c opened this issue Dec 2, 2024 · 6 comments

Comments

@mr-c
Copy link
Member

mr-c commented Dec 2, 2024

for UI listings of inputs, outputs, record schema fields, and workflow steps.

Todo:

  • test major YAML parsers to see if they do indeed preserve order
  • if not, write demo code to transform maps to lists before parsing (or submit patches to add an order preserving option)
  • write this into the spec (no conformance tests possible, alas)
@tetron
Copy link
Member

tetron commented Dec 2, 2024

👎 as proposed

https://yaml.org/spec/1.2.2/#3221-mapping-key-order

In the representation model, mapping keys do not have an order. To serialize a mapping, it is necessary to impose an ordering on its keys. This order is a serialization detail and should not be used when composing the representation graph (and hence for the preservation of application data). In every case where node order is significant, a sequence must be used. For example, an ordered mapping can be represented as a sequence of mappings, where each mapping is a single key/value pair. YAML provides convenient compact notation for this case.

The correct way to do this per the YAML spec is like this:

- key1: value1
- key2: value2

Which is equivalent to

[
  {"key1": "value1"},
  {"key2": "value2"}
]

(per Example 8.19 Compact Block Mappings in https://yaml.org/spec/1.2.2/#822-block-mappings)

It wouldn't be a huge reach to make schema salad recognize these single key dicts as a special case, this would be a extension on "idmaps" which is how maps are converted to list of objects right now.

So my counter proposal is to use the recommended YAML approach in situations where we want to preserve order but want a more convenient syntax for writing identifiers.

It is also worth noting that the post-processed document that to which schema and JSON-LD context applies, many things that are converted to lists (in particular, the inputs, outputs and steps sections) currently do not preserve order.

@mr-c
Copy link
Member Author

mr-c commented Dec 9, 2024

@tetron I agree, contradicting the YAML spec is not nice. However it would be a really positive user experience boost, and less clunky that their recommended lists of single-entry dicts.

Lets say there was a stand-along tool that would convert the map syntax in selected fields ( inputs, outputs, record schema fields, and workflow steps) to the equivalent list syntax that is already supported in CWL v1.2. Would the existence of that tool be enough to make this spec change; as it could be used by CWL implementations whose YAML libraries don't preserve map order?

Some examples to motivate the discussion:

Original

inputs: 
  my_input: File
  other_input:
     type: string
     label: This is important!

would have to become the following in CWL v1.3+ to preserve order

inputs:
  - my_input: File
  - other_input:
    - type: string
    - label: This important!

By my proposal the original document does not need re-writing to ensure the ordering of the inputs. For those CWL systems that use a YAML reading library that does not preserve map order, then the stand-alone tool would silently rewrite it thusly (using the compact notation where needed to preserve the line numbers).

inputs: 
  - { id: my_input, type: File }
  - id: other_input
    type: string
    label: This is important!

@tetron
Copy link
Member

tetron commented Dec 9, 2024

Unless the ordering of label and type is relevant here, you'd only need to order the input items themselves, writing it like this:

inputs:
- my_input: File
- other_input:
    type: string
    label: This is important!

By my proposal the original document does not need re-writing to ensure the ordering of the inputs. For those CWL systems that use a YAML reading library that does not preserve map order, then the stand-alone tool would silently rewrite it thusly (using the compact notation where needed to preserve the line numbers).

cwltool --print-pre already basically does this, but it is slow on large documents, which is why the codegen parsers exist. Telling people that have parsers written in other languages have to call out to a Python program seems extremely awkward?

Schema salad parsers exist for C++, D, C#, Java, Python, and Typescript. I think it would be useful to investigate which YAML parsers (besides ruamel) actually have an API that can preserve mapping order.

I understand what you're trying to do but I think this potentially creates a huge headache for implementers, when we already have two other proposed solutions using the explicit sequence syntax that don't have any of these problems.

@mr-c
Copy link
Member Author

mr-c commented Dec 10, 2024

Schema salad parsers exist for C++, D, C#, Java, Python, and Typescript. I think it would be useful to investigate which YAML parsers (besides ruamel) actually have an API that can preserve mapping order.

With common-workflow-language/schema_salad#899 merged ruaml.yaml / cwl-utils & typescript / cwl-ts-auto will preserve the mapping order

@mr-c
Copy link
Member Author

mr-c commented Dec 11, 2024

I have confirmed that yaml-cpp / cwl-cpp-auto preserve the order of maps (though I ran into some other issues with cwl-cpp-auto itself)

@mr-c
Copy link
Member Author

mr-c commented Dec 11, 2024

@tetron I will keep testing, but given the trend of order-preserving maps I suggest the following:

  1. CWL v1.3 say "YAML map ordering SHOULD be preserved for inputs, outputs, record schema fields, and workflow steps to aid in predictable UI renderings"
  2. Users are suggested to switch to list-based syntax if they experience out-of-order rendering.
  3. No new syntax like Compact Block Mappings is introduced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants