-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
85 lines (64 loc) · 2.82 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# CLUEY
<!-- badges: start -->
<!-- badges: end -->
This is an R package for estimating the number of clusters in uni and multi-modal single-cell data. CLUEY uses cell-type identity markers to guide the clustering process and performs recursive clusters to ensure
that sub-populations are captured.
| ![](man/figures/CLUEY_figure.png) |
|:--------------------------------------------------:|
## Dependencies
CLUEY requires both keras and tensorflow, please have both installed. You can follow the instructions provided at this [link](https://tensorflow.rstudio.com/install/).
## Installation
CLUEY can be installed using the following command:
```{r, warning=FALSE, error=FALSE, message=FALSE, eval=FALSE}
library(devtools)
install_github("SydneyBioX/CLUEY")
```
## Generating knowledge base
You can generate your own knowledge base using the `generateKnowledgeBase` function like below:
```{r, warning=FALSE, error=FALSE, message=FALSE, eval=FALSE}
knowledgeBase <- generateKnowledgeBase(exprsMat=logcounts(sce), celltypes=sce$cellType)
```
## Cluster data
In this example, we will upload an example knowledge base generated from the [Mouse Cell Atlas (FACS)](https://www.nature.com/articles/s41586-020-2496-1) and cluster an example query dataset which was subsampled from [Zilionis et al.](https://pubmed.ncbi.nlm.nih.gov/30979687/) using the `runCLUEY` function.
```{r, warning=FALSE, error=FALSE, message=FALSE}
library(CLUEY)
library(scater)
library(ggplot2)
library(gridExtra)
set.seed(3435)
# Load example knowledge base
data(mcaFACS)
# Load example query data
data(exampleData)
# Run CLUEY
# If your logcounts matrix is in dgCMatrix format, then you'll need to convert it to a matrix using `as.matrix()`
clustering_results <- runCLUEY(exprsMatRNA=as.matrix(logcounts(exampleData)), knowledgeBase=mcaFACS, kLimit=10)
```
## Viewing results
We can now view the results of the clustering performed by CLUEY. CLUEY predicts there to be 5 clusters in the data.
```{r, warning=FALSE, error=FALSE, message=FALSE}
set.seed(3435)
# View the optimal number of clusters predicted by CLUEY
clustering_results$optimal_K
# We can store the results in the metadata of our SingleCellExperiment object.
colData(exampleData) <- cbind(colData(exampleData), clustering_results$predictions)
# Run UMAP to visualise clusters
exampleData <- runPCA(exampleData)
exampleData <- runUMAP(exampleData)
umap <- data.frame(reducedDim(exampleData, "UMAP"))
umap$cluster <- as.factor(exampleData$cluster)
umap$correlation <- exampleData$correlation
ggplot(umap, aes(x=UMAP1, y=UMAP2, color=cluster)) + geom_point() + theme_classic()
```