Skip to content

Commit

Permalink
Merge pull request #30 from nfdi4plants/fragmentSelectors
Browse files Browse the repository at this point in the history
Mention Data Fragment Selectors
  • Loading branch information
HLWeil authored Sep 16, 2024
2 parents f9bf2fc + aad36f4 commit 949fb10
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 4 deletions.
3 changes: 2 additions & 1 deletion src/pages/details/arc-data-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ ARC is an implementation of a FAIR Digital Object (FDO), utilizing RO-Crate with
![ARC RO Crate](/arc-website/arc-ro-crate.png)

ARC extends the basic RO-Crate concept by incorporating detailed descriptions of the processes that lead to the generation of data. This enhancement allows the data model to represent a complete process graph, encompassing experimental procedures, simulations, analyses, and the interconnections and provenance among them.
In this model, research elements are the nodes of the process graph, while the connections between them, defined as lab processes, are represented by edges. Each process can be further specified and annotated with explanatory and descriptive metadata using lists of PropertyValues, enhancing its clarity and traceability.
In this model, research elements are the nodes of the process graph, while the connections between them, defined as lab processes, are represented by edges. To allow for unambiguous inclusion of data entities into the process graph, Data Fragment Selectors, defined by W3, can be used in.
Each process can be further specified and annotated with explanatory and descriptive metadata using lists of PropertyValues, enhancing its clarity and traceability.

![ARC RO Crate](/arc-website/ARC-isa-cwl-decorations.png)

Expand Down
20 changes: 17 additions & 3 deletions src/pages/details/documentation-principle.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ A workflow, on the other hand, is the computational protocol detailing how the d

## Annotation Principle

The ARC annotation principle aims to document all processes—from the object of study, through measurements and analysis, to the final results—as data. Each step, whether it involves sampling, sample preparation, measurement, simulation, or analysis, is treated as a process that generates outputs. These outputs can take the form of result files or sample identifiers, which in turn serve as inputs for subsequent processes. This allows for the chaining and branching of different processes, effectively modeling real-world workflows in the lab and providing a clear, documented path leading to the final results.
The ARC annotation principle aims to document all processes—from the object of study, through measurements and analysis, to the final results—as data. Each step, whether it involves sampling, sample preparation, measurement, simulation, or analysis, is treated as a process that generates outputs. These outputs can take the form of result files or sample identifiers, which in turn serve as inputs for subsequent processes. This allows for the chaining and branching of different processes, effectively modeling real-world workflows in the lab and providing a clear, documented path leading to the final results.

Each process is annotated with descriptive metadata in the form of key-value pairs, where the key defines the type of data, and the value may optionally include a unit. For example, the key might be "temperature" with a value of "37" and a unit of "°C." To maintain consistency, avoid errors, and support FAIR data principles, keys should be selected from domain-specific terminologies or dictionaries, where each term or its ID can be referenced. If the value is not numerical, it is recommended to use a controlled term from such a dictionary.

Expand All @@ -47,12 +47,26 @@ Special header keys have specific meanings, such as sample name, protocol refere

![Annotation Principle](/arc-website/annotation-principle-figure-1.png)

### ISA Model Key Structuring
### ISA Model Key Parametrization

Following the ISA model, keys are enclosed in square brackets. Additional qualifiers may be used to further specify the key. Common qualifiers include:
Following the ISA model, parametrization keys are enclosed in square brackets. An additional column type is written before this key to specify what the content in the column is referring to. Common types include:

- **Parameter:** Typically used for process-related metadata.
- **Component:** Refers to an element used during the process.
- **Characteristic:** Describes the properties or characteristics of the input to a given process.

The main column is followed up by additional columns providing contextual information about the ontology terms used.

### ISA Input/Output Typization

To annotate the entities that are transformed and created in the processes, Input and Output columns are used. The type of entity is enclosed in square brackets. Common types include:

- **Material:** Physically existing entities.
- **Sample:** Physically existing biological samples.
- **Data:** Digital data stored in the ARC or online.

Different entities of data, stemming from distinct process setups or parallel measurements of distinct input samples, are often stored together in the same data file. In these cases, referencing the data file does not unambiguously represent the provenance graph. Therefore, following well established semantic web standards, Data Fragment Selectors can be appended to the file path to annotate specific fragments of these files.

----------

These conventions ensure a structured and consistent approach to annotating complex experimental workflows, making the data more traceable and understandable.

0 comments on commit 949fb10

Please sign in to comment.