marp | theme | paginate | license | title | author | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
true |
marp-theme_dataplant-ceplas-ccby |
true |
[CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) |
Data Storage and Versioning |
|
Backup | Archive | |
---|---|---|
Storage type | Short-, mid-term | Long-term |
Purpose | Disaster recovery | Long-term storage, compliance |
Reason | Duplication | Migration |
Usage | Work in progress | Cold, Unused data |
Changes | Short-term updates | No updates |
Trend | Cyclic, Replacement | Growing |
Latency | Short/Costly | High/Cheaper |
It’s good practice to document:
- What was changed?
- Who is responsible?
- When did it happen?
- Why the changes?
- by file name (_v1, _v2)
- cloud services
- dropbox, icloud, gdrive
- distributed version control system
- e.g. Git
- paper manuscript (.docx)
- single-cell RNASeq reads (.fastq.gz)
- spread sheet with photometer measurements (.xlsx)
- calendar invitation (.ical)
- photo of SDS-PAGE (.jpeg)
- excel workbook with calculations (.xlsx)
- presentation for a conference (.pdf)
- data analysis script (.py)
✓ Documents
✓ Small data
✓ Presentations
X Code
X Data analytical projects
X Big (“raw”) data
∼ Documents
✓ Small data
∼ Presentations
✓✓ Code ✓✓ Data analytical projects ∼ Big (“raw”) data
- Save time
- Avoid doing repetitive tasks “by hand”
- Reuse scripts, analyses, pipelines
- Reproduce results
(... as long as it works)
(... as long as it works)
- Version control system
- Git “repository” = a central data package (directory)
- Allows to track changes to any file in the repository
- What was changed
- When was it changed
- By whom was it changed
- Why was it changed?
- A well-documented cloud environment
- Active syncing
- Not automatically synced
- Non-automated version control
- You have the control what changes to track and what to sync
- Time machine to go back to older versions
Simplifies concurrent work & merging changes
- Online service to host our projects
- Share code with other developers
- Others can download our projects, work on and contribute to them
- They can upload their changes and merge them with the main project
Slides presented here include contributions by
- name: Dominik Brilhaus github: https://github.com/brilator orcid: https://orcid.org/0000-0001-9021-3197
- name: Hajira Jabeen github: https://github.com/HajiraJabeen orcid: https://orcid.org/0000-0003-1476-2121