Skip to content

Commit

Permalink
Merge pull request #269 from thoelken/main
Browse files Browse the repository at this point in the history
Corrected headings level and capitalization in some articles
  • Loading branch information
thoelken authored Sep 13, 2024
2 parents f4602b7 + b84d4c5 commit 3d0dd30
Show file tree
Hide file tree
Showing 25 changed files with 424 additions and 179 deletions.
240 changes: 240 additions & 0 deletions docs/_Getting-Started/01-privacy-policy-english-translation.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/_Getting-Started/02-contributing.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Contributing to the NFDI4Microbiota Knowledge Base
title: How to Contribute
category: Getting-Started
layout: default
docs_css: markdown
Expand Down
2 changes: 1 addition & 1 deletion docs/_How-We-Operate/01-governance-workflows.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Governance workflows
title: Governance Workflows
category: How-We-Operate
layout: default
docs_css: markdown
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Privacy policy
---
title: Privacy Policy
category: How-We-Operate
layout: default
docs_css: markdown
---

## DISCLAIMER: The following policy is an automated translation of the German text. Please refer to the German original for a legally binding document.

Expand Down
10 changes: 3 additions & 7 deletions docs/_RDM-Collect/13-data-qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Legend:
* END = no solution, this problem is unsolvable


# RNA-seq
## RNA-seq
1. high peak at low bp in the electropherogram (intensity mV per Size bp)
- **source**: documentation (PDF)
- **possible reason(s)**: contamination e.g. adapter dimers (adapter+adapter, no DNA)
Expand Down Expand Up @@ -149,7 +149,7 @@ Legend:
- **possible reason(s)**: humans are bad with ratios (0.01 = almost 0 and 100 is just large but not the largest bar ever)
- **solution/measure**: use any log transformation (e.g. log10: 0.01 => -2, 100 => +2)

# Single cell
## Single cell

### Quality check
1. peak at left/right side in gene or reads per cell histogram or log10-cummulative-number of reads per cell id
Expand Down Expand Up @@ -191,9 +191,5 @@ Legend:
- **possible reason(s)**: some genes can be interpreted as dates when using excel for data handling <https://doi.org/10.1126/science.aah4573>
- **solution/measure**: never ever use excel or at least make sure that cell type is not "AUTO"

# Get Help
## Get Help
If you have any further questions about the management and analysis of your microbial research data, please contact us: [[email protected]](mailto:[email protected]) (by emailing us you agree to the privacy policy on our website: [Contact](https://nfdi4microbiota.de/contact-form/))

# Further resources

# References
18 changes: 9 additions & 9 deletions docs/_RDM-Plan/01-dmp.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ layout: default
docs_css: markdown
---

# Introduction
## Introduction
A Data Management Plan (DMP) is a formal and living document that defines responsibilities and provides guidance. It describes data and data management during the project and measures for archiving and making data and research results available, usable, and understandable after the project has ended.

DMPs are required in [DFG funding proposals since 2022](https://www.dfg.de/en/research_funding/announcements_proposals/2022/info_wissenschaft_22_25/index.html) and in [EU Funding Programs 2021-2027](https://ec.europa.eu/info/funding-tenders/opportunities/docs/2021-2027/common/guidance/aga_en.pdf). For funders, DMPs serve as a reporting tool to hold grantees accountable for conducting good and open science, with regular updates or in case of changes. For researchers and other stakeholders, DMPs are meant to be a living document that accompanies them from proposal writing or project start to the sharing of their data and results.

# Content of DMPs
## Content of DMPs
DMPs typically include the following information:
* Administrative project-specific information (including a description of the research project)
* Roles, responsibilities and obligations
Expand All @@ -27,7 +27,7 @@ DMPs typically include the following information:

To find a TDR, see the [Data Repositories page of the Knowledge Base]({% link _RDM-Share/22-data-repositories.md %}).

# DMP templates and examples
## DMP templates and examples

**Templates**
* [NFDI4Microbiota's template](https://doi.org/10.5281/zenodo.13628589)
Expand All @@ -38,7 +38,7 @@ To find a TDR, see the [Data Repositories page of the Knowledge Base]({% link _R
* [DD-DeCaF Bioinformatics Services for Data-Driven Design of Cell Factories and Communities](https://phaidra.univie.ac.at/o:1139495)
* [METASTAVA](https://doi.org/10.5281/zenodo.5841166)

# Benefits of a DMP
## Benefits of a DMP
If implemented correctly, a DMP can [benefit all stakeholders](https://doi.org/10.1371/journal.pcbi.1006750) in a research project, despite the initial cost of creating the DMP itself.

A DMP can **save time and nerves** for yourself and others by planning ahead. DMPs define roles, responsibilities, and efforts regarding the data and its management. Writing a DMP will also get you in touch with IT staff and your institution's data protection officer at an early stage. Writing a DMP also ensures data quality and allows you to easily trace your processing steps, making your analysis and results reproducible. Writing a DMP also allows you to manage access rights and prevent security breaches. Finally, by writing your DMP, you may be able to identify gaps and vulnerabilities in your current data management strategy at an early stage and outline solutions to fill them.
Expand All @@ -47,7 +47,7 @@ A DMP can also facilitate and **harmonize the coordination and shared use of dat

DMPs offer **other benefits**, such as enabling verification and control: researchers are accountable for how their data are managed during their research project. They also help to identify - and potentially minimize - time and money costs that need to be included in the proposal, such as for Research Data Management (RDM) activities. They also help to comply with Good Research Practice (GRP), support research integrity, and ensure that ethical and legal requirements are met. DMPs also help to meet institutional and funder requirements: funding agencies increasingly require information on the management of research data, and a DMP allows you to structure and formalize this information. Last but not least, DMPs facilitate data reuse, thereby increasing data citation and advancing scientific progress.

# Writing a DMP
## Writing a DMP

**Who is involved in the creation of the DMP?** Entities involved in the creation of a DMP are researchers, RDM staff (check your institution's [research data policy](https://www.forschungsdaten.org/index.php/Forschungsdaten-Policies) and ask for [local support](https://www.forschungsdaten.org/index.php/FDM-Kontakte)) and central infrastructure (e.g. computer center, library).

Expand All @@ -57,7 +57,7 @@ DMPs offer **other benefits**, such as enabling verification and control: resear

**DMP quality check:** A good DMP is well structured and distinguishes between actions to be taken during and after the project. It is a living document that needs to be updated regularly and is for the use of all project stakeholders. It should be started as early as possible, be as concise as possible, as long as necessary, and contain sufficient detail without being redundant. Ideally, the DMP will be published with the research data at the end of the project.

# DMP tools
## DMP tools
Although it is generally possible to formulate a DMP in a text document, the use of more dynamic and machine-readable formats finally unlocks its full potential.

* **[Research Data Management Organizer](https://rdmorganiser.github.io/) (RDMO)** is an open-source web application that has been widely adopted by institutes and consortia in Germany. RDMO supports the structured and collaborative planning and implementation of RDM and also enables the textual output of a DMP.
Expand All @@ -69,7 +69,7 @@ RDMO organizes individual DMPs around predefined templates that reflect the requ

* **[DMPonline](https://dmponline.dcc.ac.uk/)** was developed by the [Digital Curation Center](https://www.dcc.ac.uk/) (DCC) for the UK funding context but has also been used elsewhere. It is an open-source, web-based tool for researchers. It enables the creation, review, and sharing of DMPs that meet institutional and funder requirements.

# Further resources
## Further resources
* Cessda - [Data Management Expert Guide](https://dmeg.cessda.eu/Data-Management-Expert-Guide)
* [Content of a Data Management Plan](https://doi.org/10.18154/RWTH-2019-10064)
* [Data Management Plan — the Turing Way - Data Management Plan](https://the-turing-way.netlify.app/reproducible-research/rdm/rdm-dmp.html)
Expand All @@ -90,8 +90,8 @@ RDMO organizes individual DMPs around predefined templates that reflect the requ
* [SM Wizard](https://smw.ds-wizard.org/)
* [Writing and using a software management plan](https://www.software.ac.uk/guide/writing-and-using-software-management-plan)

# Get Help
## Get Help
If you have any further questions about the management and analysis of your microbial research data, please contact us: [[email protected]](mailto:[email protected]) (by emailing us you agree to the privacy policy on our website: [Contact](https://nfdi4microbiota.de/contact-form/))

# References
## References
{% bibliography --cited_in_order %}
20 changes: 10 additions & 10 deletions docs/_RDM-Preserve/24-aruna-object-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ layout: default
docs_css: markdown
---

# Abstract
## Abstract
Aruna Object Storage (AOS) is a modern distributed storage platform designed to meet the increasing demand for effective data management and storage of scientific data. It is the central storage of the [Research Data Commons (RDC)](23-research-data-commons.html) cloud layer and the data foundation for the upper layers. It is a cloud-native, scalable system with an API and a S3-compatible interface. It allows resource organization into Objects, Datasets, Collections and Projects. Additionally, it provides an event-driven architecture which enables automation, data validation and improves accessibility and reproducibility of scientific results. AOS is open-source and available at [https://aruna-storage.org](https://aruna-storage.org).

# Factsheet
## Factsheet
* ![Aruna Object Storage Logo]({{ '/assets/img/aruna_dark_font.png' | relative_url }} "Aruna Object Storage Logo"){:width="20%"}
* Status: V2.x BETA, V1.x deprecated
* Current Version: V2.0.x beta
Expand All @@ -18,7 +18,7 @@ Aruna Object Storage (AOS) is a modern distributed storage platform designed to

![AOS inside RDC]({{ '/assets/img/rdc_aruna.png' | relative_url }} "AOS inside RDC"){:width="70%"}

# Overview
## Overview
AOS is a fast, secure and geo-redundant data storage. It offers a sophisticated metadata management according to the FAIR principles. It builds the foundation for RDCs mediation and semantic layer and and handles all stored data objects secure, and data-agnostically.

AOS key features are:
Expand All @@ -33,21 +33,21 @@ Storing data in localized, domain-specific data silos has limited use for collab

![Aruna Object Storage Concept]({{ '/assets/img/concept_aruna.png' | relative_url }} "Aruna Object Storage Concept"){:width="40%"}

# Getting started
## Getting started
AOS is located at [https://aruna-storage.org](https://aruna-storage.org). Users can log in there. Currently, the AAI of the GWDG is used for this purpose, which requires a user account at the GWDG, the DFN or at LifeScience AAI. Nevertheless, additional identity providers are possible. Thus, login via an SSO of NFDI4Biodiversity (and other NFDIs) will be supported when the service is established. After the AOS account has been activated, the user can create a project. Further users can then be activated for this project to enable data exchange and joint processing. The project can then be filled with data either via the API or via the S3 interface.

![Aruna Object Storage Start Page]({{ '/assets/img/aruna-startpage-2023-7-28_8-24-10.png' | relative_url }} "Aruna Object Storage Start Page"){:width="60%"}

# User Guide
## User Guide
Basically, AOS is intended as a data backend for the RDC. For this reason, very few end users will use AOS directly. Data import, verification, transformation and processing is basically possible via the services in the mediation layer. This also ensures the consistency of the data. Users and services can be informed about changes to individual data objects or even entire projects via the AOS notification service and can thus react to these changes.

# Developer Guide
## Developer Guide
The current documentation for using AOS is linked from the AOS home page at [https://aruna-storage.org](https://aruna-storage.org). This contains a complete description of the API. AOS consists of five main components: AOS Server, AOS Proxy, AOS API (and its S3 interface), AOS CLI and AOS Notification System. Of these components, the AOS team installs and maintains the servers and associated databases. AOS proxies can then be installed at various locations, which then communicate with the servers in each case. The actual data traffic from and to the storage backend then takes place via the AOS proxies. The interaction between a client and the proxies/servers takes place via the AOS API. To reduce the entry barrier, there is a command line interface called AOS CLI, which encapsulates API calls. Moreover, an S3 interface was implemented, since many software packages already support data storage via S3 as industry standard. Finally, the AOS notification system will soon be released to allow immediate response to changes in the AOS. This can be, for example, a data verification that is automatically initiated when a data upload is complete.

## AOS infrastructure
### AOS infrastructure
The main component of AOS is a distributed database system. It synchronizes all data between several computers at different locations and thus generates fail-safety via this redundancy. This database is regularly backed up. The actual data can also be synchronized across multiple sites to provide redundancy. Nevertheless, all data will also be stored at one location in a redundant system. Due to the fact that data cannot be overwritten, but new versions of the data are then created, in combination with the redundant data storage at multiple levels, no backup of the data is currently performed. An implementation at a later date is currently being discussed.

## AOS data structure
### AOS data structure
AOS organizes data in Version 1.x into Projects, Collections, Object Groups, and Objects, starting with version 2.x the data structure will be even more flexible and are organized into Projects, Collections, Datasets, and Objects with a more flexible relation model.

|![Aruna Object Storage Structure V1]({{ '/assets/img/aruna-1-structure.png' | relative_url }} "Aruna Object Storage Structure V1"){:width="50%"} |
Expand All @@ -58,9 +58,9 @@ AOS organizes data in Version 1.x into Projects, Collections, Object Groups, and
|-|
| UML diagram of the Aruna Object Storage data structure starting in Version v2.0. All resources form a directed acyclic graph of belongs to relationships (blue) with Projects as roots and Objects as leaves. Resources can also describe horizontal version relationships (orange), data/metadata relationships (yellow) or even custom user-defined relationships (green). |

# Get Help
## Get Help
If you have any further questions about the management and analysis of your microbial research data, please contact us: [[email protected]](mailto:[email protected]) (by emailing us you agree to the privacy policy on our website: [Contact](https://nfdi4microbiota.de/contact-form/))

# References
## References
* Dokumentation and Aruna start page: [https://aruna-storage.org](https://aruna-storage.org)
* Source-Code: [https://github.com/ArunaStorage](https://github.com/ArunaStorage)
19 changes: 10 additions & 9 deletions docs/_RDM-Preserve/25-digital-preservation.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@ category: RDM-Preserve
layout: default
docs_css: markdown
---
# Definition

## Definition
Digital preservation means taking certain measures to ensure that digital material can be found and can be accessed in the long term ("long-term accessibility of data"). It aims to preserve information in a way that is understandable and reusable for a specific community and to prove its authenticity.

# Digital preservation for researchers
## Digital preservation for researchers
The sustainable handling of data by researchers naturally facilitates the long-term accessibility of data. Best practice methods are:
* Cleaning data / data structures - see also: [Data Organisation](https://knowledgebase.nfdi4microbiota.de/RDM-Process/14-data-organization.html)
* Validating data - see also: [Data Quality Control](https://knowledgebase.nfdi4microbiota.de/RDM-Collect/13-data-qc.html)
Expand All @@ -18,7 +19,7 @@ The sustainable handling of data by researchers naturally facilitates the long-t
* Storing files on 2 different media types
* Keeping at least 1 copy off site.

## Data selection
### Data selection
To decide well-founded on data selection we recommend reading the how-to guide of the Edinburgh Digital Curation Centre {% cite dcc_five_2014 %}. The suggested steps are:
* **Step 1:** Identify purposes that the data could fulfill: consider the purpose or ‘reuse case’ of your data, including reuse outside your research group.
* **Step 2:** Identify data that **must** be kept: consider legal or policy compliance risks, as well as funder requirements.
Expand All @@ -27,7 +28,7 @@ To decide well-founded on data selection we recommend reading the how-to guide o
* **Step 5:** Complete the data appraisal, i.e. list what data must, should or could be kept to fulfill which potential reuse purposes. Summarize any actions needed to prepare the data for deposit - or justification for not keeping it.


## Recommended file formats for preservation
### Recommended file formats for preservation
Making your research available in recommended file formats additional to the original software format supports highly the reusability and long-term accessibility of your data.
Attributes of those file formats are:
* Open rather than proprietary (examples for [open files formats](https://en.wikipedia.org/wiki/List_of_open_file_formats))
Expand All @@ -40,18 +41,18 @@ Attributes of those file formats are:

For biomaterial data, recommended formats are CSV, TXT and XML.

# Digital preservation for repository operators
## Digital preservation for repository operators

Specific preservation measures depend on the digital objects, needs of the user community, and various other conditions. Repositories usually contain publications as files, making file format identification and validation relevant.

## Bitstream preservation
### Bitstream preservation
Preservation on the bitstream level is the basis for digital preservation. It covers e. g.
* Checking checksums of transferred files upon receiving them (or generating file checksums) and conducting regular fixity checks
* Redundant storage of data
* Generating backups (e. g. offline backups of the underlying repository database)
* Strategies for updating storage media (according to e. g. server lifetime)

## Preservation beyond bitstream
### Preservation beyond bitstream
Preservation of file content, being able to open and render it correctly in a software is part of logical {% cite lindlar_2020_3672773 %} or technical preservation, also called digital curation. Semantic preservation is concerned with e. g. semantic drift impacting metadata.
* Obtaining sufficient rights allowing e. g. format migrations, file repairs and re-use over the long-term like re-publication in other infrastructures
* File format identification, based format-specific bit patterns, e. g. via [DROID](https://coptr.digipres.org/index.php/DROID) during publication process
Expand All @@ -69,10 +70,10 @@ Preservation of file content, being able to open and render it correctly in a so

Many digital preservation criteria applying to repositories are also present in the certification criteria of the CoreTrustSeal and the nestor seal {% cite coretrustseal_standards_and_certificatio_2022_7051012 harmsen_henk_explanatory_2013 %}.

# Get Help
## Get Help
If you have any further questions about the management and analysis of your microbial research data, please contact us: [[email protected]](mailto:[email protected]) (by emailing us you agree to the privacy policy on our website: [Contact](https://nfdi4microbiota.de/contact-form/))

# References
## References
{% bibliography --cited_in_order %}


Loading

0 comments on commit 3d0dd30

Please sign in to comment.