In single-cell sequencing datasets, cells are often assigned meta data information (e.g., batch, sample ID, condition...). scPropensity is a global measure of the relationship between meta data assigment and molecular similarity across cells.
scPropensity was applied to the analysis of cancer clones (see [Nadalin et al.]): we asked whether two clones have a more or less similar transcriptional profile than expected by chance. Using scPropensity we computed a clone-clone transcriptional similarity, which we used to classify clones into distinct transcriptional groups (lineages) and evaluate their molecular heterogeneity.
scPropensity is inspired from a concept in structural bioinformatics called statistical potential, which is useful to evaluate the likelihood of a protein complex model via a pseudo-energy function computed from a database of experimental protein structures.
In particular, pair propensity scores are derived from amino acid pairings at the protein-protein interface and are defined as p(x,y) = F(x,y)/G(x,y), where F(x,y) is the observed frequency of pair (x,y) and G(x,y) is the expected frequency of pair (x,y). Depending on the value of p(x,y), x and y are more (> 1), less (< 1) or equally (= 1) likely to be in contact with each other than expected by chance.
Here, x and y are cell labels. A cell-cell similarity measure is derived from the assay (gene expression, chromatin accessiblity state...) and is used to build a k-nn graph, where nodes are cells and a directed edge connects cell i with cell j if and only if j is one of the closest k cells to i according to this measure.
F(x,y) is defined as the number of edges (i,j) in the k-nn graph such that i is labelled with x and j is labelled with y; G(x,y) is the expected number of edges labelled with (x,y) given the neighbourhood size k and the number of cells labelled with x and y, respectively (see [Nadalin et al.] for details). Therefore, p(x,y) tells whether cells labelled with x tend to be more (> 1), less (< 1) or equally (= 1) similar to the cells labelled with y than expected by chance.
- R v4.0.3
- Seurat v4.0.5
scPropensity is implemented in R, it takes as input a Seurat object and a meta data field ID.
To compute the pair propensity score on object.Rds
with respect to sample.name
, run:
scPropensity(object.file = "object.Rds", slot = "sample.name", outdir = "dir")
The above function builds a k-nn graph, computes the pair propensities of the labels in [email protected]$sample.name
and creates a folder dir
containing output files. It contains a n x n matrix M, where n is the number of distict values in sample.name
, and M[x,y] is the log pair propensity of (x,y). It also contains the L2-normalised version of M.
If you find this software useful, please cite:
Nadalin et al. Multi-omic lineage tracing predicts the transcriptional, epigenetic and genetic determinants of cancer evolution. Nature Communications. https://doi.org/10.1101/2023.06.28.546923