Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specifications for 0.1 draft #2

Merged
merged 23 commits into from
Aug 28, 2024
Merged
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
329 changes: 324 additions & 5 deletions profile/arc_cwl_ro_crate.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,342 @@
# ISA RO-Crate Profile
# CWL RO-Crate Profile

* Version:
* Version: 0.1
* Permalink:
* Authors
* - https://orcid.org/

## Overview
The ARC CWL RO-Crate profile consists of two basic parts. It is divided in the description of the workflow, that can also be a standalone workflow description,
and the workflow invocation. The workflow invocation directly references the workflow description and provides the concrete input and output parameters for the workflow.

CWL allows the use of [metadata](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) describing the workflows. The metadata often contains general information about licensing, authorship and affiliation, but is not limited to that. It is possible to describe the steps described by a workflow, or properties describing the run execution, in more detail. This profile aims to specify where and how the metadata contained within CWL workflow and CWL job files should be stored.

### ARC CWL Workflow Profile

The CWL Workflow Profile extends the [Bioschemas ComputationalWorkflow Profile](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description). A computational workflow consists of an orchestrated and repeatable pattern of activity enabled by the systematic organization of resources into processes that transform materials, provide services, or process information (source Wikipedia.org). An example of the original profile can be found [here](https://www.researchobject.org/ro-crate/specification/1.1/workflows.html#complete-workflow-example)
caroott marked this conversation as resolved.
Show resolved Hide resolved
Computational workflows and laboratory workflows show many similarities, they typically only differ in how they are executed. To stay consistent of how processes in the ARC are described, we try to stay consistent with the [ISA RO-Crate Profile](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#isa-ro-crate-profile). We therefore propose to use a multi type for the workflow profile. The type is therefore extended by [LabProtocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol). Protocols can be described using [PropertyValue](https://schema.org/PropertyValue). Workflow complexity can vary. Workflows executing several tools in succession are common and require more complex annotation. This can be achieved by using lists of property values.
caroott marked this conversation as resolved.
Show resolved Hide resolved

### CWL Workflow Run Profile

The CWL Workflow Run Profile extends the [Workflow Run Crate](https://www.researchobject.org/workflow-run-crate/profiles/workflow_run_crate/). This profile describes the execution of a computational tool that orchestrates other tools, represented as a workflow executed using a Workflow Management System (WMS). The Workflow Run Crate combines [Process Run Crate](https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate/) and [Workflow RO-Crate](https://about.workflowhub.eu/Workflow-RO-Crate/), requiring a ComputationalWorkflow mainEntity and [CreateAction](https://schema.org/CreateAction) instances corresponding to the execution. Workflows can have multiple input and output parameters, defined optionally as FormalParameter entities and linked to the workflow's inputs and outputs.
To continue staying consisten with the [ISA RO-Crate Profile](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#isa-ro-crate-profile), we propose to add the [LabProcess](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprocess) to the type CreateAction of the Process Run Crate within the Workflow Run Crate. This allows the annotation of inputs and outputs with metadata describing the properties of those Datasets and the processes leading from inputs to outputs.
caroott marked this conversation as resolved.
Show resolved Hide resolved
caroott marked this conversation as resolved.
Show resolved Hide resolved

```mermaid
flowchart TD

A["File\nSoftwareSourceCode\nComputationalWorkflow\nLabProtocol"] -- "input\noutput" --> B["FormalParameter"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this reads like a list of options for the type. Maybe annotate it as [File,SoftwareSourceCode,...]? This holds for all multi types of course.

A -- "instrument" --> C["CreateAction\nLabProcess"]
C -- "executes" --> A
caroott marked this conversation as resolved.
Show resolved Hide resolved
C -- "agent" --> D["Person or Organization"]
B -- "exampleOfWork" --> E["File or Property Value"]
C -- "object result" --> E
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are File and PropertyValue merged here? To me, it doesn't make sense for the object/result to be a PropertyValue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the original process run crate:
Entities referenced by an action’s object or result SHOULD be of type File (an RO-Crate alias for MediaObject) for files, Dataset for directories and Collection for multi-file datasets, but MAY be a CreativeWork for other types of data (e.g. an online database); they MAY be of type PropertyValue to capture numbers/strings that are not stored as files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, and also makes a lot of sense for some kinds of workflows. It doesn't fit with the ISA model though. However, in an ARC, CWL workflows are clearly separated from the ISA process graph. So I'm not sure how we should handle this. Do we want to allow/encourage such workflows that do not produce files? If we do not, it is ok to diverge in a derived profile.

Any opinions on that @HLWeil, @muehlhaus?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense in processing steps/workflows, that are part of a wf/nested wf. But in the ARC, we recommend to annotate higher level wfs (see here. Those shouldn't return results that are not files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the use of PropertyValue here implies some kind of inline data, mixed into the annotation?

Not easy to judge whether we can/should/must allow this option in the profile. Maybe open an issue as a place for a postponed discussion?

D["Assay=Dataset"] -- "processSequence" --> C
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, the property processSequence of an Assay has not been added to the bioschemas specification and has been mapped to about.

D -- "hasPart" --> E
```
The "inputs" and "outputs" of the "ComputationalWorkflow" MAY point to the "objects" and "results" of "CreateAction" via "workExample", while the latter point to the former via "exampleOfWork".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to put these terms into code snippets instead of quotation marks


## Requirements

### CWL Workflow Profile

The requirements of this profile are those of [Bioschemas ComputationalWorkflow Profile](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description)
with the modifications listed below.

#### ComputationalWorkflow
| Property | Required | Expected Type | Description | CD | Controlled Vocabulary|
|----------|----------|---------------|-------------|----|----------------------|
| @type | MUST | [Text](https://schema.org/Text) | MUST be of type [File](https://schema.org/MediaObject), [SoftwareSourceCode](https://schema.org/SoftwareSourceCode), [ComputationalWorkflow](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE) and [LabProtocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol)| MANY | Schema.org, Bioschemas

### CWL Workflow Run Profile

The requirements of this profile are those of [Workflow Run Crate](https://www.researchobject.org/workflow-run-crate/profiles/workflow_run_crate/)
with the modifications listed below.

#### Process Run Crate

| Property | Required | Expected Type | Description |
|----------|----------|---------------|-------------|
| | | | |
| @type | MUST | [Text](https://schema.org/Text) | MUST be of type [CreateAction](https://schema.org/CreateAction) and [LabProcess](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprocess)| MANY | Schema.org, Bioschemas



## Example ro-crate-metadata.json

_TODO: simple example and a link to a more complete example_
### CWL Workflow Profile

```json
{ "@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": { "@id": "https://w3id.org/ro/crate/1.1" },
"about": { "@id": "./" }
},
{
"@id": "./workflows",
"@type": "Dataset",
"hasPart": [
{ "@id": "workflows/workflow.cwl" }
]
},
{
"@id": "workflows/workflow.cwl",
"@type": [ "File", "SoftwareSourceCode", "ComputationalWorkflow" ],
"conformsTo": { "@id": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE" },
"name": "Column Addition",
"programmingLanguage": [
{ "@id": "#FSharp" },
{ "@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl" }
],
"creator": { "@id": "https://orcid.org/0000-0003-3925-6778" },
"dateCreated": "2024-02-05",
"input": [
{ "@id": "intensity_table" },
{ "@id": "file_name" }
],
"output": [
{ "@id": "summed_intensities" }
]
"about": [
# TODO add some example metadata
# PropertyValue
]
},
{
"@id": "intensity_table",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "intensity_table",
"valueRequired": true,
"additionalType": "File",
"format": { "@id": "http://edamontology.org/format_3752" }
},
{
"@id": "file_name",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "file_name"
},
{
"@id": "summed_intensities",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "summed_intensities",
"additionalType": "File",
"encodingFormat": { "@id": "http://edamontology.org/format_3475" }
},
{
"@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl",
"@type": "computerlanguage",
"name": "common workflow language",
"alternatename": "cwl",
"identifier": {
"@id": "https://w3id.org/cwl/v1.2/"
},
"url": {
"@id": "https://www.commonwl.org/"
}
},
{
"@id": "#FSharp",
"@type": "ProgrammingLanguage",
"name": "F Sharp",
"alternateName": "F#",
"url": "https://dotnet.microsoft.com/en-us/languages/fsharp",
"version": "6.0"
},
{
"@id": "https://orcid.org/0000-0003-3925-6778",
"@type": "Person",
"name": "Timo Mühlhaus"
},
{
"@id": "http://edamontology.org/format_3752",
"@type": "Thing",
"name": "Comma-separated values"
},
{
"@id": "http://edamontology.org/format_3475",
"@type": "Thing",
"name": "Tab-separated values"
}
}
```

### CWL Workflow Run Profile

```json
{ "@context": [
"https://w3id.org/ro/terms/workflow-run/context"
],
"@graph": [
{
"@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": { "@id": "https://w3id.org/ro/crate/1.1" },
"about": { "@id": "./" }
},-
{
"@id": "./workflows",
"@type": "Dataset",
"conformsTo": [
{"@id": "https://w3id.org/ro/wfrun/process/0.1"},
{"@id": "https://w3id.org/ro/wfrun/workflow/0.1"},
{"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"}
],
"hasPart": [
{ "@id": "workflows/workflow.cwl" },
{ "@id": "assays/measurement1/dataset/table.csv" }
{ "@id": "runs/fsResult1/result.csv" }
],
"mainEntity": {"@id": "Galaxy-Workflow-Hello_World.ga"}
#TODO Way to reference run instance to correctly fill "mentions" field?
},
{ "@id": "https://w3id.org/ro/wfrun/process/0.1",
"@type": "CreativeWork",
"name": "Process Run Crate",
"version": "0.1"
},
{ "@id": "https://w3id.org/ro/wfrun/workflow/0.1",
"@type": "CreativeWork",
"name": "Workflow Run Crate",
"version": "0.1"
},
{ "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0",
"@type": "CreativeWork",
"name": "Workflow RO-Crate",
"version": "1.0"
},
{
"@id": "workflows/workflow.cwl",
"@type": [ "File", "SoftwareSourceCode", "ComputationalWorkflow" ],
"conformsTo": { "@id": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE" },
"name": "Column Addition",
"programmingLanguage": [
{ "@id": "#FSharp" },
{ "@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl" }
],
"creator": { "@id": "https://orcid.org/0000-0003-3925-6778" },
"dateCreated": "2024-02-05",
"input": [
{ "@id": "intensity_table" },
{ "@id": "file_name" }
],
"output": [
{ "@id": "summed_intensities" }
]
"about": [
]
},
{
"@id": "intensity_table",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "intensity_table",
"valueRequired": true,
"additionalType": "File",
"format": { "@id": "http://edamontology.org/format_3752" },
"workExample": {"@id": "assays/measurement1/dataset/table.csv"}
},
{
"@id": "file_name",
"@type": "FormalParameter",
"additionalType": "Text"
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "file_name",
"valueRequired": true,
"workExample": {"@id": "file_name_filled"}
},
{
"@id": "summed_intensities",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "summed_intensities",
"additionalType": "File",
"encodingFormat": { "@id": "http://edamontology.org/format_3475" },
"workExample": {"@id": "runs/fsResult1/result.csv"}
},
{
"@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl",
"@type": "computerlanguage",
"name": "common workflow language",
"alternatename": "cwl",
"identifier": {
"@id": "https://w3id.org/cwl/v1.2/"
},
"url": {
"@id": "https://www.commonwl.org/"
}
},
{
"@id": "#FSharp",
"@type": "ProgrammingLanguage",
"name": "F Sharp",
"alternateName": "F#",
"url": "https://dotnet.microsoft.com/en-us/languages/fsharp",
"version": "6.0"
},
{
"@id": "https://orcid.org/0000-0003-3925-6778",
"@type": "Person",
"name": "Timo Mühlhaus"
},
{
"@id": "http://edamontology.org/format_3752",
"@type": "Thing",
"name": "Comma-separated values"
},
{
"@id": "http://edamontology.org/format_3475",
"@type": "Thing",
"name": "Tab-separated values"
}
{
"@id": "#wfrun-1",
"@type": "CreateAction",
"name": "CWL workflow run 1",
"endTime": "",
"instrument": {"@id": "workflows/workflow.cwl"},
"subjectOf": {"@id": ""},
"object": [
{"@id": "assays/measurement1/dataset/table.csv"},
{"@id": "file_name_filled"}
],
"result": [
{"@id": "runs/fsResult1/result.csv"}
]
},
{
"@id": "assays/measurement1/dataset/table.csv",
"@type": "File",
"description": "Number columns in csv format",
"encodingFormat": "text/plain",
"name": "intensity_table",
"exampleOfWork": {"@id": "intensity_table"}
},
{
"@id": "#file_name_filled",
"@type": "PropertyValue",
"@additionalType": "Text",
"exampleOfWork": {"@id": "file_name"},
"name": "file_name",
"value": "./result.csv"
},
{
"@id": "runs/fsResult1/result.csv",
"@type": "File",
"name": "summed_intensities",
"description": "Summed intensity columns",
"encodingFormat": "text/plain",
"exampleOfWork": {"@id": "summed_intensities"}
},
{
"@id": "cwltool",
"@type": "CreativeWork",
"encodingFormat": "text/html",
"datePublished": "",
"name": "Workflow Execution Example Workflow"
}
]
}
```