Skip to content

Commit

Permalink
add organization principles
Browse files Browse the repository at this point in the history
  • Loading branch information
muehlhaus committed Sep 14, 2024
1 parent 56f893c commit 4269430
Show file tree
Hide file tree
Showing 6 changed files with 66 additions and 8 deletions.
Binary file modified public/documentation-principle-study.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/orga-principle-folder2process.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/orga-principle-scaffold.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/components/Home/ResearchGraphNavigation.astro
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import { Color } from "../GraphNavigation.astro";
const circles = [
{ id: 1, cx: 8, cy: 10, r: 5, href: URLS.Internal_Home + "/details/documentation-principle", text: 'documentation principle', angle: 80 },
{ id: 2, cx: 30, cy: 40, r: 5, href: '#link2', text: 'organization principle', angle: 200 },
{ id: 2, cx: 30, cy: 40, r: 5, href: URLS.Internal_Home + "/details/organization-principle", text: 'organization principle', angle: 200 },
{ id: 3, cx: 60, cy: 35, r: 5, href: '#link3', text: 'quality control', angle: 130 },
{ id: 4, cx: 90, cy: 15, r: 5, href: '#link4', text: 'exchange & publication', angle: 10 },
{ id: 5, cx: 140, cy: 25, r: 5, href: '#link4', text: 'RDM & FAIRness', angle: 210 },
Expand Down
14 changes: 7 additions & 7 deletions src/pages/details/documentation-principle.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
layout: ../../layouts/MarkdownLayout.astro
title: 'Documentation Principle'
title: 'Documentation and Annotation'
pubDate: 2024-09-13
description: 'A short summary for ARC related tools and services.'
description: 'A introduction to the ARC documentation and annotation principles.'
author: 'Timo Mühlhaus'
image:
url: 'https://docs.astro.build/assets/rose.webp'
alt: 'The Astro logo on a dark background with a pink glow.'
tags: ["tools", "services", "community"]
tags: ["community", "organization", "project management"]
---

## Documentation Principle
Expand All @@ -29,7 +29,7 @@ When computational analysis is performed on a sample or on the data resulting fr
A workflow, on the other hand, is the computational protocol detailing how the data is processed, simulated, or analyzed on a computer without actually executing the computation. Since workflows offer significant value for reuse in other datasets, they are documented separately from runs.
![Documentation Principle](/arc-website/documentation-principle-workflow.png)

Notice: The ARC is designed to document the entire journey (process) from the object of study, through measurements and analysis (as processes), to the final results. This journey represents a process of processes, capturing each stage as part of the broader transformation from observable phenomena to conclusive outcomes. The ARC annotation principle is to add tags on these process for documentation.
> Notice: The ARC is designed to document the entire journey (process) from the object of study, through measurements and analysis (as processes) to the final results. This journey represents a process of processes, capturing each stage as part of the broader transformation from observable phenomena to conclusive outcomes. The ARC annotation principle is to add tags on these process for documentation.
(The term "experiment" is avoided here to prevent confusion, as it can intuitively overlap with "investigation," "study," or "assay" depending on context.)

Expand All @@ -51,8 +51,8 @@ Special header keys have specific meanings, such as sample name, protocol refere

Following the ISA model, keys are enclosed in square brackets. Additional qualifiers may be used to further specify the key. Common qualifiers include:

- Parameter: Typically used for process-related metadata.
- Component: Refers to an element used during the process.
- Characteristic: Describes the properties or characteristics of the input to a given process.
- **Parameter:** Typically used for process-related metadata.
- **Component:** Refers to an element used during the process.
- **Characteristic:** Describes the properties or characteristics of the input to a given process.

These conventions ensure a structured and consistent approach to annotating complex experimental workflows, making the data more traceable and understandable.
58 changes: 58 additions & 0 deletions src/pages/details/organization-principle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
layout: ../../layouts/MarkdownLayout.astro
title: 'Organization and structure'
pubDate: 2024-09-13
description: 'A introduction to the ARC organization principles.'
author: 'Timo Mühlhaus'
image:
url: 'https://docs.astro.build/assets/rose.webp'
alt: 'The Astro logo on a dark background with a pink glow.'
tags: ["community", "organization", "project management"]
---

### Organization Principle

The core principle of the ARC is that collected data are stored in directories, while metadata are maintained in accompanying tables that reference and describe the data. This organizational structure is closely aligned with the ISA model, while also incorporating workflows, computational processing, and analysis results.

![ARC scaffold structure](/arc-website/orga-principle-folder2process.png)

The foundational idea behind ARC is to provide a directory scaffold that ensures research data, along with their processing and analysis, are organized in a structured and annotated manner. This scaffold supplies a basic file structure for organizing research locally on a personal machine, as well as on data-producing devices such as measurement instruments or compute servers. A key feature of this system is its ability to seamlessly transfer to the cloud, specifically to a DataHUB instance (e.g., Git-LFS) hosted by an institution or NFDI consortium.

Through GIT’s versioning mechanism, data can be easily backed up, integrated across devices, and shared with collaborators. Each interaction is tracked and can be reverted if necessary. Additionally, DataHUB offers project management tools, such as task assignments and discussion boards.

The unified structure of ARC ensures that research can be shared and understood easily by others. Several software tools are available to help create ARCs and support data analysis functionalities. While ARC requires a specific directory structure to be recognized, researchers have the flexibility to add additional files or folders as needed.

### ARC Directory Structure

ARC represents an entire investigation. At the top level, it includes directories named **“studies,” “assays,” “workflows,”** and **“runs,”** along with an **investigation metadata table** that holds all administrative metadata.

![ARC scaffold structure](/arc-website/orga-principle-scaffold.png)

#### Study
The **studies** folder contains one or more studies, each in its own directory. Each study folder contains:
- **Study metadata file**: A table with metadata describing the study.
- **Resources folder**: Contains external data used in the study.
- **Protocols folder**: Stores protocols that describe the process from starting material (or data) to samples. These protocols should be stored in a format that can be referenced in the metadata table.

#### Assay
The **assays** folder contains one or more assays, each in its own directory. Each assay folder contains:
- **Assay metadata file**: A table with metadata describing the measurement process.
- **Dataset folder**: Holds the resulting data from the assay process, typically raw measurement files (open file formats are encouraged).
- **Protocols folder**: Contains the protocols that describe the process from samples to measurement.

#### Workflows
The **workflows** folder contains subfolders for each workflow, which may include anything from simple scripts to full programs or toolchains for simulations, processing, or analysis. These workflows should not be tied to specific input/output files, which are instead managed by the metadata in a run.
- **Workflow metadata file**: Describes the executables and computational environment needed for the workflow.

#### Runs
For each computational run, a separate folder is created to store the resulting data.
- **Run metadata file**: Contains specific parameter values for the run, including the input data used.

### Flexibility and Expansion
The ARC scaffold provides a well-defined space for organizing research data but does not require every aspect to be filled. Researchers and collaborators can use this structure as needed, leaving any irrelevant sections empty. Additional folders can be created for other research elements, such as paper drafts or notes, allowing for flexible expansion.

For more detailed information, refer to the [ARC Scaffold Specification].

### Continuous Quality Control
ARC supports continuous quality control in the background, ensuring data integrity without interrupting work. The entire investigation can be compiled into a data publication, assigned a DOI, and referenced in journal publications.

0 comments on commit 4269430

Please sign in to comment.