Skip to content

Commit

Permalink
Update user_docs/metadata/overview.md
Browse files Browse the repository at this point in the history
Co-authored-by: Karoline Mauer <[email protected]>
  • Loading branch information
sbilge and mauerk authored Aug 5, 2024
1 parent 85a05ad commit fc19fd9
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion user_docs/metadata/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ The German Human Genome-Phenome Archive (GHGA) provides a nation-wide resource f

This documentation serves as the description and reasoning behind the Metadata Model of GHGA, which encapsulates the metadata schema, its technical implementation, and resources to support submission of metadata. The Archive function of GHGA is envisioned to handle a wide variety of omics and research data. The GHGA metadata model aims at facilitating comprehensive submissions that maximize the amount of collected metadata without creating friction on the submitter side, enabling (reusable) submissions of different types of -omics data into GHGA. This metadata model can satisfy the heterogeneous needs of submitters while maintaining the FAIR principles, interoperability with EGA and facilitating streamlined user journeys.

The schema broadly differentiates the research and administrative aspects of the metadata model - the research metadata aims at maximising the reusability and FAIRness of the data while the administrative metadata The administrative metadata focusses on managing the resources such as creation or acquisition of the data, rights management, and disposition. The research metadata classes include Individual, Biospcimen/Sample, Experiment, Experiment methods, Analysis and Analysis methods. The administrative metadata captures Dataset, Data Access Committee (DAC) and Data Access Policy (DAP), Study/Publication. Furthermore the model also differentiates between three file types, which are different classes controlled by ranges of file types - namely (i) research data files is which defined as the digital entity resulting from the measurement or sequencing of a sample such as FASTQ, RAW, IDAT (ii) processed data files defined as files that result from an analysis, alignment or computation processing step of a research data file such as BAM, VCF and (iii) supporting files are auxiliary files that provide further structured information about an individual, experiment or analysis such as JSON, TXT.
Classes in the schema can be grouped into **Research Metadata** and **Administrative Metadata** based on the information they capture. The **Research Metadata** aims at maximising the reusability and FAIRness of the data, while the **Administrative Metadata** focuses on managing the resources, such as creation or acquisition of the data, rights management, and disposition. The Research Metadata classes include *Individual*, *Biospcimen/Sample*, *Experiment*, *Experiment Method*, *Analysis* and *Analysis Method*. The Administrative Metadata captures *Dataset*, *Data Access Policy*, *Data Access Committee*, *Publication*, and *Study*.

The model also differentiates between three file types:
- **Research Data File**: A file which results from the omics experiment, such as sequencing of a sample.
- **Process Data File**: A file that is generated as output from an analysis performed on a *Research Data File*, such as alignment or processing.
- **Supporting File**: A file that provides further information about an *Individual*, *Experiment Method* or *Analysis Method*. These could be unstructured protocols or structured information, such as Phenopackets or BioCompute Objects.

Furthermore we provide data submitters with a Submission Spreadsheet in order to easily deposit their data within GHGA.

0 comments on commit fc19fd9

Please sign in to comment.