This repo is used to consolidate various metadata related to a large portion of the Segrè lab's microbial strains. The majority of this work originated from the HFSP collaboration Interactions Among Marine Microbes (IAMM).
The result is a master metadata file which should be used as primary point of information on any of the strains covered.
ds_strain_id
: This is the unique strain identifier across projects. It stands for "Daniel Segrè lab strain identifier" This ID must be used for all work related to these strains for consistencies.phylum
,class
, ...,strain_designation
: Names of Strain at various taxonomic levelssource
,source_catalog_number
: Where we got the strain from.id_...
: Project-specific identifiers. See below for specific labels. This value together with the value of columnds_strain_id
can serve to map between different projects. In addition, presence of a value in these columns indicates that this strain is being used in a particular project.RES_...
,LIB_...
,ZOC_...
: Project-specific metadata, included for convenience. See below for specific labels.
The following projects have in some way dealt with (some) of the strains:
re-sequencing
/RES
: Re-sequencing of bacteria retrieved from various sources to investigate genome variation.marine_library
/LIB
: The Segrè lab library of marine microbial strains covering a wide range of (potential interaction) traits based on in silico analysis by Zoccerato et al. (ZOC
). This is a physical strain library which is in our -80 °C.forchielli22
/FOR
: Phenotyping of marine bacteria on single carbon sources. DOI: https://doi.org/10.1128/msystems.00070-22zoccarato22
/ZOC
: An in silicon study across 473 to identify genome functional clusters (GFCs) grouping strains with similar traits (potentially involved in microbial interactions). DOI: https://doi.org/10.1038/s42003-022-03184-4
- Used metadata_merge.R to merge and unify strain metadata of
RES
andFOR
projects.- This used the following metadata files as input:
- metafile.csv: initial metafile of the
RES
project - forchielli2022-....xlsx:
FOR
metadata
- metafile.csv: initial metafile of the
- This created
master_metadata_file.tsv
- This used the following metadata files as input:
- The metadata from projects
FOR
andRES
in filemaster_metadata_file.tsv
where manually combined, curated and integrated resulting in 20231026-master_metadata_file-curated.xlsx. - Used consolidation_euler.R to integrate the curated metadata with
LIB
andZOC
project's:- This used the following metadata files as input:
- 20231026-master_metadata_file-curated.xlsx
- 20240122-strain_lib.tsv:
LIB
metafile of the marine strain library- created initially by Konrad, 2022-11-01 using 20221101-metadata_merge.R, curated by hand 2023-12-13 and 2024-01-22
- zoccarato2022-....xlsx:
ZOC
metadata - 20231205-map-metadata_zoccarrato22.tsv: file used for manual mapping to
ZOC
metadata after using heuristics (mapping by Species, Reference-file, Source ID)
- This creates a dated version of IAMM_metadata_file.tsv and the Euler diagram below.
- This used the following metadata files as input:
- Since IAMM_metadata_file.tsv was initially copied/renamed from its dated version produced by the above process and also put under version control. Further changes need to be carefully curated and committed.