This is an R package for estimating the number of clusters in uni and multi-modal single-cell data. CLUEY uses cell-type identity markers to guide the clustering process and performs recursive clusters to ensure that sub-populations are captured.
CLUEY requires both keras and tensorflow, please have both installed. You can follow the instructions provided at this link.
CLUEY can be installed using the following command:
library(devtools)
install_github("SydneyBioX/CLUEY")
You can generate your own knowledge base using the
generateKnowledgeBase
function like below:
knowledgeBase <- generateKnowledgeBase(exprsMat=logcounts(sce), celltypes=sce$cellType)
In this example, we will upload an example knowledge base generated from
the Mouse Cell Atlas
(FACS) and cluster
an example query dataset which was subsampled from Zilionis et
al. using the runCLUEY
function.
library(CLUEY)
library(scater)
library(ggplot2)
library(gridExtra)
set.seed(3435)
# Load example knowledge base
data(mcaFACS)
# Load example query data
data(exampleData)
# Run CLUEY
# If your logcounts matrix is in dgCMatrix format, then you'll need to convert it to a matrix using `as.matrix()`
clustering_results <- runCLUEY(exprsMatRNA=as.matrix(logcounts(exampleData)), knowledgeBase=mcaFACS, kLimit=10)
#> 25/25 - 0s - 148ms/epoch - 6ms/step
#> 13/13 - 0s - 94ms/epoch - 7ms/step
#> 25/25 - 0s - 108ms/epoch - 4ms/step
#> 13/13 - 0s - 77ms/epoch - 6ms/step
We can now view the results of the clustering performed by CLUEY. CLUEY predicts there to be 5 clusters in the data.
set.seed(3435)
# View the optimal number of clusters predicted by CLUEY
clustering_results$optimal_K
#> [1] 5
# We can store the results in the metadata of our SingleCellExperiment object.
colData(exampleData) <- cbind(colData(exampleData), clustering_results$predictions)
# Run UMAP to visualise clusters
exampleData <- runPCA(exampleData)
exampleData <- runUMAP(exampleData)
umap <- data.frame(reducedDim(exampleData, "UMAP"))
umap$cluster <- as.factor(exampleData$cluster)
umap$correlation <- exampleData$correlation
ggplot(umap, aes(x=UMAP1, y=UMAP2, color=cluster)) + geom_point() + theme_classic()