Skip to content

Repository accompanying the paper "PhenoGMM: Gaussian mixture modelling of microbial cytometry data enables efficient predictions of biodiversity".

Notifications You must be signed in to change notification settings

CMET-UGent/PhenoGMM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhenoGMM: Gaussian mixture modelling of microbial cytometry data enables efficient predictions of biodiversity

This repository accompanies the manscript "PhenoGMM: Gaussian mixture modelling of microbial cytometry data enables efficient predictions of biodiversity" by P. Rubbens, R. Props, F.-M. Kerckhof, N. Boon and W. Waegeman. Biorxiv ID: 641464.

Abstract

Motivation: Microbial flow cytometry allows to rapidly characterize microbial community diversity and dynamics. Recent research has demonstrated a strong connection between the cytometric diversity and taxonomic diversity based on 16S rRNA gene amplicon sequencing data. This creates the opportunity to integrate both types of data to study and predict the microbial community diversity in an automated and efficient way. However, microbial flow cytometry data results in a number of unique challenges that need to be addressed.

Results: The results of our work are threefold: i) We expand current microbial cytometry fingerprinting approaches by using a model-based fingerprinting approach based upon Gaussian Mixture Models, which we called PhenoGMM. ii) We show that microbial diversity can be rapidly estimated by PhenoGMM. In combination with a supervised machine learning model, diversity estimations based on 16S rRNA gene amplicon sequencing data can be predicted. iii) We evaluate our method extensively by using multiple datasets from different ecosystems and compare its predictive power with a generic binning fingerprinting approach that is commonly used in microbial flow cytometry. These results confirm the strong connection between the genetic make-up of a microbial community and its phenotypic properties as measured by flow cytometry.

Availability: All code and data supporting this manuscript is freely available on this repository. Raw flow cytometry data is additionally available via FlowRepository and raw sequences via the NCBI Sequence Read Archive. The functionality of PhenoGMM has been incorporated in the R package PhenoFlow, which we recommend R users to use.

Structure of repository:

Examples of workflow:

Two running examples in jupyter notebooks are given, one for the in silico data study, and one for studying natural communities:

Results:

The presented results are an average of multiple runs (5 or 10) on multiple datasets:

  • Flow cytometry data can be found in the folder Data. See also the FlowRepository, with IDs FR-FCM-ZZSH, FR-FCM- ZZNA, FR-FCM-ZY9J and FR-FCM-ZYZN.
  • OTU-tables can be found there as well, for the cooling water and for the lake systems.
  • Code to reproduce the figures (Python scripts starting with plot) and all results that are the basis of the figures can be found here.

About

Repository accompanying the paper "PhenoGMM: Gaussian mixture modelling of microbial cytometry data enables efficient predictions of biodiversity".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.0%
  • Jupyter Notebook 50.0%