Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata and ontologies informations #104

Merged
merged 19 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions docs/_Getting-Started/03-contributors.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,8 @@ docs_css: markdown
4. Konrad U. Förstner (ORCID ID: [0000-0002-1481-2996](http://orcid.org/0000-0002-1481-2996))
5. Paul M. J. Klemm (ORCID ID: [0000-0002-3609-5713](https://orcid.org/0000-0002-3609-5713))
6. Uta Parmaksiz (ORCID ID: [0000-0002-0087-5056](https://orcid.org/0000-0002-0087-5056))
7. Frank Förster (ORCID ID: [0000-0003-4166-5423](https://orcid.org/0000-0003-4166-5423))
8. \<Enter your name here>
7. Charlie Pauvert (ORCID ID: [0000-0001-9832-2507](https://orcid.org/0000-0001-9832-2507))
8. Maja Magel (ORCID ID: [0009-0004-2517-0791](https://orcid.org/0009-0004-2517-0791))
9. Martin Bole (ORCID ID: [0009-0004-9189-8852](https://orcid.org/0009-0004-9189-8852))
10. Frank Förster (ORCID ID: [0000-0003-4166-5423](https://orcid.org/0000-0003-4166-5423))
11. \<Enter your name here>
41 changes: 39 additions & 2 deletions docs/_Research-Data-Management/03-md.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,51 @@ category: Research-Data-Management
layout: default
docs_css: markdown
redirect_from: /Research-Data-Management
empty: true
hide: true
empty: false
hide: false
---

# Metadata

## Metadata is data about data
cpauvert marked this conversation as resolved.
Show resolved Hide resolved

Before we delve into specifications on what metadata standards for the microbiology community are, let us explain what metadata is.

In general, metadata provides us with information about other data but does not tell us anything about the content of the data itself. Instead, it describes other types of information to help you understand or use the data you are working with. In the simplest terms, metadata is **data** about data.

For more details on the distinction between different types of metadata, we refer you to the FAIR Cookbook recipe [FAIR and the notion of metadata](https://w3id.org/faircookbook/FCB068) section.

## When should you collect your metadata

As is usually the case in sciences, your research (and the wider microbiological community) can benefit highly from the rigorous and timely planning of your experiments, including metadata collection. In this case, we refer you to other subsections of this Knowledge Base: [**Data Management Plans (DMPs)**](./08-dmp.md) that could help you plan your experiments.

Metadata collection should be planned, but at the same time, it can be overwhelming. What amount of metadata is enough to describe your data? Was it crucial to note down the pipette I was using, or should I have noted down the location of the sampling site? Will I still understand in a year what I wrote down in my notebook/ELN? Will other researchers make sense of the (meta)data I collected? Will other researchers be able to replicate my research if I did not note down my in-house DNA extraction protocol?

These and other considerations should be thoroughly thought out before the start of your experimental procedures. Some of the metadata can even be collected and documented before starting the experiments if you already know how to collect your samples, process, sequence them (if sequencing is a part of the analysis), and analyze them.

## Metadata collection example

We will look into an example of microbiological environmental metadata, where we gather samples from a forest environment, specifically plant rhizosphere, and we will be doing amplicon and metagenomic sequencing. We will not dive specifically into all omics types and biological/environmental on this page. Instead, we encourage readers to read our [MetadataStandards](https://github.com/NFDI4Microbiota/MetadataStandards) resource repository.

Our proposal for a sampling campaign to analyze plant-rhizosphere microbiomes was accepted. In the proposal, we outlined the purpose and goal of our campaign. Since our funding is public, our funding agency requires us to submit our generated and gathered data to a public repository (e.g., ENA, NCBI, DDBJ). To find where you can deposit your data, we refer our readers to the [Data Repositories](./22-data-repositories.md) section of this KnowledgeBase. There we see that our Nucleic acid sequences can be deposited in ENA.

We immediately jump to [**ENA's Sample Checklist browser**](https://www.ebi.ac.uk/ena/browser/checklists) and find a checklist that best corresponds to our sampling campaign. After some scrolling, we discover the [GSC MIxS plant associated; Checklist: ERC000020](https://www.ebi.ac.uk/ena/browser/view/ERC000020), that list some of the **Mandatory** metadata fields that need to be filled out for data submission, along with their **Field Format** and **Field restriction** and **Optional** fields. The metadata fields correspond to technical metadata (e.g., sequencing method, sample volume or weight for DNA extraction, nucleic acid extraction, library size, etc.) along with some metadata fields corresponding to the biological and environmental metadata (e.g., broad-scale environmental context, local environmental context, environmental medium, geographic location (latitude) and geographic location (longitude), host metadata, sample collection metadata, etc.)

Alternatively, we can hop over to the [MetadataStandards/Plant-associated microbiome biological-environmental metadata](https://github.com/NFDI4Microbiota/MetadataStandards/blob/main/Biological_Environmental/PlantAssoc_BioEnv_Metadata.md) where we can find a similar (but stripped down) checklist with some filed out examples for biological and environmental metadata. For the technical metadata corresponding to this example samling campaign we would refer the reader the [Amplicon sequencing](https://github.com/NFDI4Microbiota/MetadataStandards/blob/main/Technical/Amplicon_Technical_Metadata.md) and [Metagenome sequencing](https://github.com/NFDI4Microbiota/MetadataStandards/blob/main/Technical/Metagenome_Technical_Metadata.md) section of our [Technical MetadataStandards](https://github.com/NFDI4Microbiota/MetadataStandards/tree/main/Technical) part of repository.

By now, we should have a rough estimation of what kind of biological/environmental metadata we can collect before sampling, during sampling, and what could be collected during the processing of samples.


# Metadata standards

Once a community agrees to a set of relevant metadata for their field, they can devise metadata standards.
A metadata standard is usually defined for a given type of data and by different stakeholders (e.g., users community, data repositories).
For every metadata fields part of a metadata standard, one could expect a human-readable description of the metadata field paired with a machine-readable persistent identifier of the field, then an indication of the level of requirements of this field in the standard and how many values of this field are expected (that is the cardinality).

[More than a thousand standards are listed by the organisation `FAIRsharing.org`](https://fairsharing.org/search?fairsharingRegistry=Standard) which can be overwhelming.
At NFDI4Microbiota, we compiled a [list of widely used metadata standards in the fied of microbiome research](https://github.com/NFDI4Microbiota/MetadataStandards) that you can browse and use for the different types of data collected during your investigations.


# Metadata management

# Metadata quality control
Expand Down