Skip to content

BASiCS: Bayesian Analysis of Single-Cell Sequencing data

Compare
Choose a tag to compare
@catavallejos catavallejos released this 06 Jul 09:55

Introduction

Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where:

  1. Cell-specific normalization constants are estimated as part of the model parameters,
  2. Technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cells lysate and
  3. The total variability of the expression counts is decomposed into technical and biological components.

BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalized by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by applied users.


This release

This release is an slightly updated version of the original release. Here we list the major changes:

  1. We modified the sampler by using Dirichlet proposals for the mRNA content size factors $\phi_j$. While this does not affect the posterior inference (numerical results are virtually the same as in the previous implementation), the algorithm becomes more efficient, since the chains for $\phi_j$ have a better mixing.
  2. We added the functions BASiCS_DenoisedCounts and BASiCS_DenoisedRates which might be helpful to perform other downstream analyses that are not included in this implementation.
  • BASiCS_DenoisedCounts provides a denoised version of the expression counts. For each gene $i$ and cell $j$ this function returns $$ x^*{ij} = \frac{ x{ij} } {\hat{\phi}_j \hat{\nu}j}, $$ where $x{ij}$ is the observed expression count of gene $i$ in cell $j$, $\hat{\phi}_j$ denotes the posterior median of $\phi_j$ and $\hat{\nu}_j$ is the posterior median of $\nu_j$.
  • BASiCS_DenoisedRates estimates normalised and denoised expression rates underlying the expression of all genes across cells. For each gene $i$ and cell $j$ this function returns $$ \Lambda_{ij} = \hat{\mu_i} \hat{\rho}{ij}, $$ where $\hat{\mu_i}$ represents the posterior median of $\mu_j$ and $\hat{\rho}{ij}$ is given by its posterior mean (Monte Carlo estimate based on the MCMC sample of all model parameters).

More details in

Catalina A. Vallejos, John C. Marioni and Sylvia Richardson (2015)
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data
PLOS Computational Biology
http://dx.doi.org/10.1371/journal.pcbi.1004333