Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specifications for 0.1 draft #2

Merged
merged 23 commits into from
Aug 28, 2024
Merged
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
326 changes: 321 additions & 5 deletions profile/arc_cwl_ro_crate.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,339 @@
# ISA RO-Crate Profile
# CWL RO-Crate Profile

* Version:
* Version: 0.1
* Permalink:
* Authors
* - https://orcid.org/

## Overview
The ARC CWL RO-Crate profile consists of two basic parts. It is divided in the description of the workflow, that can also be a standalone workflow description,
and the workflow invocation. The workflow invocation directly references the workflow description and provides the concrete input and output parameters for the workflow.

CWL allows the use of [metadata](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) describing the workflows. The metadata often contains general information about licensing, authorship and affiliation, but is not limited to that. It is possible to describe the steps described by a workflow, or properties describing the run execution, in more detail. This profile aims to specify where and how the metadata contained within CWL workflow and CWL job files should be stored.

### CWL Workflow Profile

The CWL Workflow Profile extends the [Bioschemas ComputationalWorkflow Profile](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description). It adds additional properties to describe the workflow in more detail using the [LabProcess](https://bioschemas.org/types/LabProcess/0.1-DRAFT).
An example of the original profile can be found [here](https://www.researchobject.org/ro-crate/specification/1.1/workflows.html#complete-workflow-example)
When compared to processes in a laboratory, a workflow is highly similar to a protocol. Protocols can be described using [PropertyValue](https://schema.org/PropertyValue). Workflow complexity can vary. Workflows executing several tools in succession are common and require more complex annotation. This can be achieved by using lists of property values.

### CWL Workflow Run Profile

The CWL Workflow Run Profile extends the [Workflow Run Crate](https://www.researchobject.org/workflow-run-crate/profiles/workflow_run_crate/). It adds additional properties to describe the parameters used in the workflow execution in more detail using the [LabProcess](https://bioschemas.org/types/LabProcess/0.1-DRAFT). When staying in the laboratory context, runs can be compared to performing the steps of a protocol. The steps involved in the execution of a protocoll can be described by a LabProcess. It may also contain information about inputs and outputs of the specific step.
```mermaid
flowchart TD

A["File\nSoftwareSourceCode\nComputationalWorkflow"] -- "input\noutput" --> B["FormalParameter"]
A -- "instrument" --> C["CreateAction"]
caroott marked this conversation as resolved.
Show resolved Hide resolved
C -- "agent" --> D["Person or Organization"]
B -- "exampleOfWork" --> E["File or Property Value"]
C -- "object result" --> E
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are File and PropertyValue merged here? To me, it doesn't make sense for the object/result to be a PropertyValue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the original process run crate:
Entities referenced by an action’s object or result SHOULD be of type File (an RO-Crate alias for MediaObject) for files, Dataset for directories and Collection for multi-file datasets, but MAY be a CreativeWork for other types of data (e.g. an online database); they MAY be of type PropertyValue to capture numbers/strings that are not stored as files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, and also makes a lot of sense for some kinds of workflows. It doesn't fit with the ISA model though. However, in an ARC, CWL workflows are clearly separated from the ISA process graph. So I'm not sure how we should handle this. Do we want to allow/encourage such workflows that do not produce files? If we do not, it is ok to diverge in a derived profile.

Any opinions on that @HLWeil, @muehlhaus?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense in processing steps/workflows, that are part of a wf/nested wf. But in the ARC, we recommend to annotate higher level wfs (see here. Those shouldn't return results that are not files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the use of PropertyValue here implies some kind of inline data, mixed into the annotation?

Not easy to judge whether we can/should/must allow this option in the profile. Maybe open an issue as a place for a postponed discussion?

C -- "about" --> H["Lab Process"]
caroott marked this conversation as resolved.
Show resolved Hide resolved
```
When adding the workflow run execution context, the "hasPart" field contains all files that are part of the invoced workflow. The "inputs" and "outputs" of the "ComputationalWorkflow"
MAY point to the "objects" and "results" of "CreateAction" via "workExample", while the latter point to the former via "exampleOfWork".

## Requirements

### CWL Workflow Profile

The requirements of this profile are those of [Bioschemas ComputationalWorkflow Profile](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description)
plus the ones listed below.

#### ComputationalWorkflow
| Property | Required | Expected Type | Description |
|----------|----------|---------------|-------------|
| | | | |
|about|SHOULD|[schema.org/PropertyValue](https://schema.org/PropertyValue)|The computational processes encoded in this workflow|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to drop this as well, like for the Run, and make the Workflow of double type ComputationalWorkflow and LabProtocol.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, I would add the hasPart to allow hierarchical modelling of workflows (e.g. also model processing units within a workflow as Workflows/LabProtocols).


### CWL Workflow Run Profile

The requirements of this profile are those of [Workflow Run Crate](https://www.researchobject.org/workflow-run-crate/profiles/workflow_run_crate/)
plus the ones listed below.

#### CreateAction

| Property | Required | Expected Type | Description |
|----------|----------|---------------|-------------|
|about|SHOULD|[bioSchemas.org/LabProcess](https://bioschemas.org/types/LabProcess/0.1-DRAFT)|The computational parameters in this workflow run|


## Example ro-crate-metadata.json

_TODO: simple example and a link to a more complete example_
### CWL Workflow Profile

```json
{ "@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": { "@id": "https://w3id.org/ro/crate/1.1" },
"about": { "@id": "./" }
},
{
"@id": "./workflows",
"@type": "Dataset",
"hasPart": [
{ "@id": "workflows/workflow.cwl" }
]
},
{
"@id": "workflows/workflow.cwl",
"@type": [ "File", "SoftwareSourceCode", "ComputationalWorkflow" ],
"conformsTo": { "@id": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE" },
"name": "Column Addition",
"programmingLanguage": [
{ "@id": "#FSharp" },
{ "@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl" }
],
"creator": { "@id": "https://orcid.org/0000-0003-3925-6778" },
"dateCreated": "2024-02-05",
"input": [
{ "@id": "intensity_table" },
{ "@id": "file_name" }
],
"output": [
{ "@id": "summed_intensities" }
]
"about": [
# TODO add some example metadata
# PropertyValue
]
},
{
"@id": "intensity_table",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "intensity_table",
"valueRequired": true,
"additionalType": "File",
"format": { "@id": "http://edamontology.org/format_3752" }
},
{
"@id": "file_name",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "file_name"
},
{
"@id": "summed_intensities",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "summed_intensities",
"additionalType": "File",
"encodingFormat": { "@id": "http://edamontology.org/format_3475" }
},
{
"@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl",
"@type": "computerlanguage",
"name": "common workflow language",
"alternatename": "cwl",
"identifier": {
"@id": "https://w3id.org/cwl/v1.2/"
},
"url": {
"@id": "https://www.commonwl.org/"
}
},
{
"@id": "#FSharp",
"@type": "ProgrammingLanguage",
"name": "F Sharp",
"alternateName": "F#",
"url": "https://dotnet.microsoft.com/en-us/languages/fsharp",
"version": "6.0"
},
{
"@id": "https://orcid.org/0000-0003-3925-6778",
"@type": "Person",
"name": "Timo Mühlhaus"
},
{
"@id": "http://edamontology.org/format_3752",
"@type": "Thing",
"name": "Comma-separated values"
},
{
"@id": "http://edamontology.org/format_3475",
"@type": "Thing",
"name": "Tab-separated values"
}
}
```

### CWL Workflow Run Profile

```json
{ "@context": [
"https://w3id.org/ro/terms/workflow-run/context"
],
"@graph": [
{
"@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": { "@id": "https://w3id.org/ro/crate/1.1" },
"about": { "@id": "./" }
},
{
"@id": "./workflows",
"@type": "Dataset",
"conformsTo": [
{"@id": "https://w3id.org/ro/wfrun/process/0.1"},
{"@id": "https://w3id.org/ro/wfrun/workflow/0.1"},
{"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"}
],
"hasPart": [
{ "@id": "workflows/workflow.cwl" },
{ "@id": "assays/measurement1/dataset/table.csv" }
{ "@id": "runs/fsResult1/result.csv" }
],
"mainEntity": {"@id": "Galaxy-Workflow-Hello_World.ga"}
#TODO Way to reference run instance to correctly fill "mentions" field?
},
{ "@id": "https://w3id.org/ro/wfrun/process/0.1",
"@type": "CreativeWork",
"name": "Process Run Crate",
"version": "0.1"
},
{ "@id": "https://w3id.org/ro/wfrun/workflow/0.1",
"@type": "CreativeWork",
"name": "Workflow Run Crate",
"version": "0.1"
},
{ "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0",
"@type": "CreativeWork",
"name": "Workflow RO-Crate",
"version": "1.0"
},
{
"@id": "workflows/workflow.cwl",
"@type": [ "File", "SoftwareSourceCode", "ComputationalWorkflow" ],
"conformsTo": { "@id": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE" },
"name": "Column Addition",
"programmingLanguage": [
{ "@id": "#FSharp" },
{ "@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl" }
],
"creator": { "@id": "https://orcid.org/0000-0003-3925-6778" },
"dateCreated": "2024-02-05",
"input": [
{ "@id": "intensity_table" },
{ "@id": "file_name" }
],
"output": [
{ "@id": "summed_intensities" }
]
"about": [
]
},
{
"@id": "intensity_table",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "intensity_table",
"valueRequired": true,
"additionalType": "File",
"format": { "@id": "http://edamontology.org/format_3752" },
"workExample": {"@id": "assays/measurement1/dataset/table.csv"}
},
{
"@id": "file_name",
"@type": "FormalParameter",
"additionalType": "Text"
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "file_name",
"valueRequired": true,
"workExample": {"@id": "file_name_filled"}
},
{
"@id": "summed_intensities",
"@type": "FormalParameter",
"conformsTo": { "@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/" },
"name": "summed_intensities",
"additionalType": "File",
"encodingFormat": { "@id": "http://edamontology.org/format_3475" },
"workExample": {"@id": "runs/fsResult1/result.csv"}
},
{
"@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl",
"@type": "computerlanguage",
"name": "common workflow language",
"alternatename": "cwl",
"identifier": {
"@id": "https://w3id.org/cwl/v1.2/"
},
"url": {
"@id": "https://www.commonwl.org/"
}
},
{
"@id": "#FSharp",
"@type": "ProgrammingLanguage",
"name": "F Sharp",
"alternateName": "F#",
"url": "https://dotnet.microsoft.com/en-us/languages/fsharp",
"version": "6.0"
},
{
"@id": "https://orcid.org/0000-0003-3925-6778",
"@type": "Person",
"name": "Timo Mühlhaus"
},
{
"@id": "http://edamontology.org/format_3752",
"@type": "Thing",
"name": "Comma-separated values"
},
{
"@id": "http://edamontology.org/format_3475",
"@type": "Thing",
"name": "Tab-separated values"
}
{
"@id": "#wfrun-1",
"@type": "CreateAction",
"name": "CWL workflow run 1",
"endTime": "",
"instrument": {"@id": "workflows/workflow.cwl"},
"subjectOf": {"@id": ""},
"object": [
{"@id": "assays/measurement1/dataset/table.csv"},
{"@id": "file_name_filled"}
],
"result": [
{"@id": "runs/fsResult1/result.csv"}
]
},
{
"@id": "assays/measurement1/dataset/table.csv",
"@type": "File",
"description": "Number columns in csv format",
"encodingFormat": "text/plain",
"name": "intensity_table",
"exampleOfWork": {"@id": "intensity_table"}
},
{
"@id": "#file_name_filled",
"@type": "PropertyValue",
"@additionalType": "Text",
"exampleOfWork": {"@id": "file_name"},
"name": "file_name",
"value": "./result.csv"
},
{
"@id": "runs/fsResult1/result.csv",
"@type": "File",
"name": "summed_intensities",
"description": "Summed intensity columns",
"encodingFormat": "text/plain",
"exampleOfWork": {"@id": "summed_intensities"}
},
{
"@id": "cwltool",
"@type": "CreativeWork",
"encodingFormat": "text/html",
"datePublished": "",
"name": "Workflow Execution Example Workflow"
}
]
}
```