Skip to content

Cluster-free pattern detection in labelled scRNA-seq data

License

Notifications You must be signed in to change notification settings

fnadalin/scPropensity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scPropensity

In single-cell sequencing datasets, cells are often assigned meta data information (e.g., batch, sample ID, condition...). scPropensity is a global measure of the relationship between meta data assigment and molecular similarity across cells.

scPropensity was applied to the analysis of cancer clones (see [Nadalin et al.]): we asked whether two clones have a more or less similar transcriptional profile than expected by chance. Using scPropensity we computed a clone-clone transcriptional similarity, which we used to classify clones into distinct transcriptional groups (lineages) and evaluate their molecular heterogeneity.

Description

scPropensity is inspired from a concept in structural bioinformatics called statistical potential, which is useful to evaluate the likelihood of a protein complex model via a pseudo-energy function computed from a database of experimental protein structures.

In particular, pair propensity scores are derived from amino acid pairings at the protein-protein interface and are defined as p(x,y) = F(x,y)/G(x,y), where F(x,y) is the observed frequency of pair (x,y) and G(x,y) is the expected frequency of pair (x,y). Depending on the value of p(x,y), x and y are more (> 1), less (< 1) or equally (= 1) likely to be in contact with each other than expected by chance.

Here, x and y are cell labels. A cell-cell similarity measure is derived from the assay (gene expression, chromatin accessiblity state...) and is used to build a k-nn graph, where nodes are cells and a directed edge connects cell i with cell j if and only if j is one of the closest k cells to i according to this measure.

F(x,y) is defined as the number of edges (i,j) in the k-nn graph such that i is labelled with x and j is labelled with y; G(x,y) is the expected number of edges labelled with (x,y) given the neighbourhood size k and the number of cells labelled with x and y, respectively (see [Nadalin et al.] for details). Therefore, p(x,y) tells whether cells labelled with x tend to be more (> 1), less (< 1) or equally (= 1) similar to the cells labelled with y than expected by chance.

Requirements

  • R v4.0.3
  • Seurat v4.0.5

Instructions

scPropensity is implemented in R, it takes as input a Seurat object and a meta data field ID. To compute the pair propensity score on object.Rds with respect to sample.name, run:

scPropensity(object.file = "object.Rds", slot = "sample.name", outdir = "dir")

The above function builds a k-nn graph, computes the pair propensities of the labels in [email protected]$sample.name and creates a folder dir containing output files. It contains a n x n matrix M, where n is the number of distict values in sample.name, and M[x,y] is the log pair propensity of (x,y). It also contains the L2-normalised version of M.

Citing

If you find this software useful, please cite:

Nadalin et al. Multi-omic lineage tracing predicts the transcriptional, epigenetic and genetic determinants of cancer evolution. Nature Communications. https://doi.org/10.1101/2023.06.28.546923

About

Cluster-free pattern detection in labelled scRNA-seq data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages