-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add specifications for 0.1 draft #2
Conversation
profile/arc_cwl_ro_crate.md
Outdated
| Property | Required | Expected Type | Description | | ||
|----------|----------|---------------|-------------| | ||
| | | | | | ||
|about|SHOULD|[schema.org/PropertyValue](https://schema.org/PropertyValue)|The computational processes encoded in this workflow| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to drop this as well, like for the Run, and make the Workflow of double type ComputationalWorkflow
and LabProtocol
.
profile/arc_cwl_ro_crate.md
Outdated
| Property | Required | Expected Type | Description | | ||
|----------|----------|---------------|-------------| | ||
| | | | | | ||
|about|SHOULD|[schema.org/PropertyValue](https://schema.org/PropertyValue)|The computational processes encoded in this workflow| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, I would add the hasPart
to allow hierarchical modelling of workflows (e.g. also model processing units within a workflow as Workflows/LabProtocols).
profile/arc_cwl_ro_crate.md
Outdated
@@ -32,32 +32,31 @@ flowchart TD | |||
D["Assay=Dataset"] -- "processSequence" --> C | |||
D -- "hasPart" --> E | |||
``` | |||
When adding the workflow run execution context, the "hasPart" field contains all files that are part of the invoced workflow. The "inputs" and "outputs" of the "ComputationalWorkflow" | |||
MAY point to the "objects" and "results" of "CreateAction" via "workExample", while the latter point to the former via "exampleOfWork". | |||
The "inputs" and "outputs" of the "ComputationalWorkflow" MAY point to the "objects" and "results" of "CreateAction" via "workExample", while the latter point to the former via "exampleOfWork". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to put these terms into code snippets instead of quotation marks
profile/arc_cwl_ro_crate.md
Outdated
@@ -23,11 +23,14 @@ To continue staying consisten with the [ISA RO-Crate Profile](https://github.com | |||
|
|||
```mermaid | |||
flowchart TD | |||
A["File\nSoftwareSourceCode\nComputationalWorkflow"] -- "input\noutput" --> B["FormalParameter"] | |||
A -- "instrument" --> C["CreateAction"] | |||
A["File\nSoftwareSourceCode\nComputationalWorkflow\nLabProtocol"] -- "input\noutput" --> B["FormalParameter"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, this reads like a list of options for the type. Maybe annotate it as [File,SoftwareSourceCode,...]
? This holds for all multi types of course.
A -- "instrument" --> C["CreateAction"] | ||
A["File\nSoftwareSourceCode\nComputationalWorkflow\nLabProtocol"] -- "input\noutput" --> B["FormalParameter"] | ||
A -- "instrument" --> C["CreateAction\nLabProcess"] | ||
C -- "executes" --> A | ||
C -- "agent" --> D["Person or Organization"] | ||
B -- "exampleOfWork" --> E["File or Property Value"] | ||
C -- "object result" --> E |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are File
and PropertyValue
merged here? To me, it doesn't make sense for the object
/result
to be a PropertyValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the original process run crate:
Entities referenced by an action’s object or result SHOULD be of type File (an RO-Crate alias for MediaObject) for files, Dataset for directories and Collection for multi-file datasets, but MAY be a CreativeWork for other types of data (e.g. an online database); they MAY be of type PropertyValue to capture numbers/strings that are not stored as files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting, and also makes a lot of sense for some kinds of workflows. It doesn't fit with the ISA model though. However, in an ARC, CWL workflows are clearly separated from the ISA process graph. So I'm not sure how we should handle this. Do we want to allow/encourage such workflows that do not produce files? If we do not, it is ok to diverge in a derived profile.
Any opinions on that @HLWeil, @muehlhaus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense in processing steps/workflows, that are part of a wf/nested wf. But in the ARC, we recommend to annotate higher level wfs (see here. Those shouldn't return results that are not files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the use of PropertyValue
here implies some kind of inline
data, mixed into the annotation?
Not easy to judge whether we can/should/must allow this option in the profile. Maybe open an issue as a place for a postponed discussion?
profile/arc_cwl_ro_crate.md
Outdated
C -- "agent" --> D["Person or Organization"] | ||
B -- "exampleOfWork" --> E["File or Property Value"] | ||
C -- "object result" --> E | ||
D["Assay=Dataset"] -- "processSequence" --> C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, the property processSequence
of an Assay has not been added to the bioschemas specification and has been mapped to about
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this significant task!
Added some comments to the specification part specifically. Will look into the example separately.
A -- "instrument" --> C["CreateAction"] | ||
A["File\nSoftwareSourceCode\nComputationalWorkflow\nLabProtocol"] -- "input\noutput" --> B["FormalParameter"] | ||
A -- "instrument" --> C["CreateAction\nLabProcess"] | ||
C -- "executes" --> A | ||
C -- "agent" --> D["Person or Organization"] | ||
B -- "exampleOfWork" --> E["File or Property Value"] | ||
C -- "object result" --> E |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the use of PropertyValue
here implies some kind of inline
data, mixed into the annotation?
Not easy to judge whether we can/should/must allow this option in the profile. Maybe open an issue as a place for a postponed discussion?
``` | ||
The `inputs` and `outputs` of the `ComputationalWorkflow` MAY point to the `objects` and `results` of `CreateAction` via `workExample`, while the latter point to the former via `exampleOfWork`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these fields come from?
FormalParameters (inputs/outpus) don't have such a field workExample
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is defined in the Requirements section of Workflow Run Crate:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a thing, @floWetzels? (Fields in Profile
which are not defined in Type
)
| Property | Required | Expected Type | Description | | ||
|----------|----------|---------------|-------------| | ||
| | | | | | ||
| @type | MUST | [Text](https://schema.org/Text) | MUST be of type [File](https://schema.org/MediaObject), [SoftwareSourceCode](https://schema.org/SoftwareSourceCode), [ComputationalWorkflow](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE) and [LabProtocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol)| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the MUST
multityping of File, SoftwareSourceCode and ComputationalWorkflow:
I know this stems from the workflow-run-crate-profile, but there I could only find it specified in this way in the graph.
Wouldn't this logically make more sense to be an OR
between these types instead of an AND
? Like in, either it's a tool execution or a workflow execution?
Maybe this isn't the right place here to discuss this, but just noticed it.
@caroott @HLWeil @floWetzels is there still something blocking this since the review from last month and subsequent commits? Or does it just need another review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I would say this looks stable enough. For further improvements, having this merged and then Issues opened on it seems more manageable.
Add specifications for 0.1 draft based upon Workflow RO-Crate profile and Workflow Run Crate with an extension for LabProcess. This PR references #1