Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specifications for 0.1 draft #2

Merged
merged 23 commits into from
Aug 28, 2024
Merged

Add specifications for 0.1 draft #2

merged 23 commits into from
Aug 28, 2024

Conversation

caroott
Copy link
Member

@caroott caroott commented Jul 8, 2024

Add specifications for 0.1 draft based upon Workflow RO-Crate profile and Workflow Run Crate with an extension for LabProcess. This PR references #1

@caroott caroott requested review from kMutagene and HLWeil July 8, 2024 07:38
@caroott caroott self-assigned this Jul 8, 2024
@caroott caroott requested a review from floWetzels July 8, 2024 08:05
@kMutagene kMutagene linked an issue Jul 9, 2024 that may be closed by this pull request
@kMutagene kMutagene marked this pull request as ready for review July 9, 2024 06:51
| Property | Required | Expected Type | Description |
|----------|----------|---------------|-------------|
| | | | |
|about|SHOULD|[schema.org/PropertyValue](https://schema.org/PropertyValue)|The computational processes encoded in this workflow|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to drop this as well, like for the Run, and make the Workflow of double type ComputationalWorkflow and LabProtocol.

| Property | Required | Expected Type | Description |
|----------|----------|---------------|-------------|
| | | | |
|about|SHOULD|[schema.org/PropertyValue](https://schema.org/PropertyValue)|The computational processes encoded in this workflow|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, I would add the hasPart to allow hierarchical modelling of workflows (e.g. also model processing units within a workflow as Workflows/LabProtocols).

@@ -32,32 +32,31 @@ flowchart TD
D["Assay=Dataset"] -- "processSequence" --> C
D -- "hasPart" --> E
```
When adding the workflow run execution context, the "hasPart" field contains all files that are part of the invoced workflow. The "inputs" and "outputs" of the "ComputationalWorkflow"
MAY point to the "objects" and "results" of "CreateAction" via "workExample", while the latter point to the former via "exampleOfWork".
The "inputs" and "outputs" of the "ComputationalWorkflow" MAY point to the "objects" and "results" of "CreateAction" via "workExample", while the latter point to the former via "exampleOfWork".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to put these terms into code snippets instead of quotation marks

@@ -23,11 +23,14 @@ To continue staying consisten with the [ISA RO-Crate Profile](https://github.com

```mermaid
flowchart TD
A["File\nSoftwareSourceCode\nComputationalWorkflow"] -- "input\noutput" --> B["FormalParameter"]
A -- "instrument" --> C["CreateAction"]
A["File\nSoftwareSourceCode\nComputationalWorkflow\nLabProtocol"] -- "input\noutput" --> B["FormalParameter"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this reads like a list of options for the type. Maybe annotate it as [File,SoftwareSourceCode,...]? This holds for all multi types of course.

A -- "instrument" --> C["CreateAction"]
A["File\nSoftwareSourceCode\nComputationalWorkflow\nLabProtocol"] -- "input\noutput" --> B["FormalParameter"]
A -- "instrument" --> C["CreateAction\nLabProcess"]
C -- "executes" --> A
C -- "agent" --> D["Person or Organization"]
B -- "exampleOfWork" --> E["File or Property Value"]
C -- "object result" --> E
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are File and PropertyValue merged here? To me, it doesn't make sense for the object/result to be a PropertyValue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the original process run crate:
Entities referenced by an action’s object or result SHOULD be of type File (an RO-Crate alias for MediaObject) for files, Dataset for directories and Collection for multi-file datasets, but MAY be a CreativeWork for other types of data (e.g. an online database); they MAY be of type PropertyValue to capture numbers/strings that are not stored as files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, and also makes a lot of sense for some kinds of workflows. It doesn't fit with the ISA model though. However, in an ARC, CWL workflows are clearly separated from the ISA process graph. So I'm not sure how we should handle this. Do we want to allow/encourage such workflows that do not produce files? If we do not, it is ok to diverge in a derived profile.

Any opinions on that @HLWeil, @muehlhaus?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense in processing steps/workflows, that are part of a wf/nested wf. But in the ARC, we recommend to annotate higher level wfs (see here. Those shouldn't return results that are not files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the use of PropertyValue here implies some kind of inline data, mixed into the annotation?

Not easy to judge whether we can/should/must allow this option in the profile. Maybe open an issue as a place for a postponed discussion?

C -- "agent" --> D["Person or Organization"]
B -- "exampleOfWork" --> E["File or Property Value"]
C -- "object result" --> E
D["Assay=Dataset"] -- "processSequence" --> C
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, the property processSequence of an Assay has not been added to the bioschemas specification and has been mapped to about.

Copy link
Member

@HLWeil HLWeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this significant task!

Added some comments to the specification part specifically. Will look into the example separately.

profile/arc_cwl_ro_crate.md Outdated Show resolved Hide resolved
profile/arc_cwl_ro_crate.md Outdated Show resolved Hide resolved
profile/arc_cwl_ro_crate.md Outdated Show resolved Hide resolved
profile/arc_cwl_ro_crate.md Outdated Show resolved Hide resolved
A -- "instrument" --> C["CreateAction"]
A["File\nSoftwareSourceCode\nComputationalWorkflow\nLabProtocol"] -- "input\noutput" --> B["FormalParameter"]
A -- "instrument" --> C["CreateAction\nLabProcess"]
C -- "executes" --> A
C -- "agent" --> D["Person or Organization"]
B -- "exampleOfWork" --> E["File or Property Value"]
C -- "object result" --> E
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the use of PropertyValue here implies some kind of inline data, mixed into the annotation?

Not easy to judge whether we can/should/must allow this option in the profile. Maybe open an issue as a place for a postponed discussion?

```
The `inputs` and `outputs` of the `ComputationalWorkflow` MAY point to the `objects` and `results` of `CreateAction` via `workExample`, while the latter point to the former via `exampleOfWork`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these fields come from?

FormalParameters (inputs/outpus) don't have such a field workExample.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defined in the Requirements section of Workflow Run Crate:
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a thing, @floWetzels? (Fields in Profile which are not defined in Type)

profile/arc_cwl_ro_crate.md Outdated Show resolved Hide resolved
| Property | Required | Expected Type | Description |
|----------|----------|---------------|-------------|
| | | | |
| @type | MUST | [Text](https://schema.org/Text) | MUST be of type [File](https://schema.org/MediaObject), [SoftwareSourceCode](https://schema.org/SoftwareSourceCode), [ComputationalWorkflow](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE) and [LabProtocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol)|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the MUST multityping of File, SoftwareSourceCode and ComputationalWorkflow:

I know this stems from the workflow-run-crate-profile, but there I could only find it specified in this way in the graph.

Wouldn't this logically make more sense to be an OR between these types instead of an AND? Like in, either it's a tool execution or a workflow execution?

Maybe this isn't the right place here to discuss this, but just noticed it.

profile/arc_cwl_ro_crate.md Show resolved Hide resolved
profile/arc_cwl_ro_crate.md Outdated Show resolved Hide resolved
@kMutagene
Copy link
Member

@caroott @HLWeil @floWetzels is there still something blocking this since the review from last month and subsequent commits? Or does it just need another review?

Copy link
Member

@HLWeil HLWeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I would say this looks stable enough. For further improvements, having this merged and then Issues opened on it seems more manageable.

@caroott caroott merged commit e00a5af into main Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Base profile on existing workflow run crates
4 participants