-
Notifications
You must be signed in to change notification settings - Fork 0
/
08-case_study1.qmd
813 lines (603 loc) · 36.6 KB
/
08-case_study1.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
# Case Study: Head and Neck Squamous Cell Carcinoma (Ferguson et al., 2022)
## Load libraries
```{r load libraries, echo=FALSE, results="hide", warning=FALSE}
suppressPackageStartupMessages({
library(cytomapper)
library(dplyr)
library(ggplot2)
library(simpleSeg)
library(FuseSOM)
library(ggpubr)
library(scater)
library(spicyR)
library(ClassifyR)
library(lisaClust)
library(Statial)
library(tidySingleCellExperiment)
library(SpatialExperiment)
library(SpatialDatasets)
})
```
```{r, eval=FALSE}
library(cytomapper)
library(dplyr)
library(ggplot2)
library(simpleSeg)
library(FuseSOM)
library(ggpubr)
library(scater)
library(spicyR)
library(ClassifyR)
library(lisaClust)
library(Statial)
library(tidySingleCellExperiment)
library(SpatialExperiment)
library(SpatialDatasets)
```
## Global parameters
It is convenient to set the number of cores for running code in parallel. Please choose a number that is appropriate for your resources. Set the `use_mc` flag to `TRUE` if you would like to use parallel processing for the rest of the vignette. A minimum of 2 cores is suggested since running this workflow is rather computationally intensive.
```{r set parameters}
use_mc <- TRUE
if (use_mc) {
nCores <- max(parallel::detectCores()/2, 1)
} else {
nCores <- 1
}
BPPARAM <- simpleSeg:::generateBPParam(nCores)
theme_set(theme_classic())
```
## Context
In the following we will re-analyse some IMC data [(Ferguson et al, 2022)](https://doi.org/10.1158/1078-0432.CCR-22-1332) profiling the spatial landscape of head and neck cutaneous squamous cell carcinomas (HNcSCC), the second most common type of skin cancer. The majority of HNcSCC can be treated with surgery and good local control, but a subset of large tumours infiltrate subcutaneous tissue and are considered high risk for local recurrence and metastases. The key conclusion of this manuscript (amongst others) is that spatial information about cells and the immune environment can be used to predict primary tumour progression or metastases in patients. We will use our spicy workflow to reach a similar conclusion.
The R code for this analysis is available on github <https://github.com/SydneyBioX/spicyWorkflow>.
## Read in images
Once the spicyWorkflow package is installed, these images will be located within the `spicyWorkflow` folder where the `spicyWorkflow` package is installed, under `inst/extdata/images`. Here we use `loadImages()` from the `cytomapper` package to load all the tiff images into a `CytoImageList` object and store the images as h5 file on-disk in a temporary directory using the `h5FilesPath = HDF5Array::getHDF5DumpDir()` parameter.
We will also assign the metadata columns of the `CytoImageList` object using the `mcols()` function.
```{r load images}
pathToImages <- SpatialDatasets::Ferguson_Images()
tmp <- tempfile()
unzip(pathToImages, exdir = tmp)
# Store images in a CytoImageList on_disk as h5 files to save memory.
images <- cytomapper::loadImages(
tmp,
single_channel = TRUE,
on_disk = TRUE,
h5FilesPath = HDF5Array::getHDF5DumpDir(),
BPPARAM = BPPARAM
)
mcols(images) <- S4Vectors::DataFrame(imageID = names(images))
```
### Clean channel names
As we're reading the image channels directly from the names of the TIFF image, often these channel names will need to be cleaned for ease of downstream processing.
The channel names can be accessed from the `CytoImageList` object using the `channelNames()` function.
```{r}
cn <- channelNames(images) # Read in channel names
head(cn)
cn <- sub(".*_", "", cn) # Remove preceding letters
cn <- sub(".ome", "", cn) # Remove the .ome
head(cn)
channelNames(images) <- cn # Reassign channel names
```
### Clean image names
Similarly, the image names will be taken from the folder name containing the individual TIFF images for each channel. These will often also need to be cleaned.
```{r}
head(names(images))
nam <- sapply(strsplit(names(images), "_"), `[`, 3)
head(nam)
names(images) <- nam # Reassigning image names
mcols(images)[["imageID"]] <- nam # Reassigning image names
```
## SimpleSeg: Segment the cells in the images
Our simpleSeg R package on <https://github.com/SydneyBioX/simpleSeg> provides a series of functions to generate simple segmentation masks of images. These functions leverage the functionality of the [EBImage](https://bioconductor.org/packages/release/bioc/vignettes/EBImage/inst/doc/EBImage-introduction.html) package on Bioconductor. For more flexibility when performing your segmentation in R we recommend learning to use the EBimage package. A key strength of the simpleSeg package is that we have coded multiple ways to perform some simple segmentation operations as well as incorporating multiple automatic procedures to optimise some key parameters when these aren't specified.
### Run simpleSeg
If your images are stored in a `list` or `CytoImageList` they can be segmented with a simple call to `simpleSeg()`. To summarise, `simpleSeg()` is an R implementation of a simple segmentation technique which traces out the nuclei using a specified channel using `nucleus` then dilates around the traced nuclei by a specified amount using `discSize`. The nucleus can be traced out using either one specified channel, or by using the principal components of all channels most correlated to the specified nuclear channel by setting `pca = TRUE`.
In the particular example below, we have asked `simpleSeg` to do the following:
By setting `nucleus = c("HH3")`, we've asked simpleSeg to trace out the nuclei signal in the images using the HH3 channel. By setting `pca = TRUE`, simpleSeg segments out the nuclei mask using a principal component analysis of all channels and using the principal components most aligned with the nuclei channel, in this case, HH3. By setting `cellBody = "dilate"`, simpleSeg uses a dilation strategy of segmentation, expanding out from the nucleus by a specified `discSize`. By setting `discSize = 3`, simpleSeg dilates out from the nucleus by 3 pixels. By setting `sizeSelection = 20`, simpleSeg ensures that only cells with a size greater than 20 pixels will be used. By setting `transform = "sqrt"`, simpleSeg square root transforms each of the channels prior to segmentation. By setting `tissue = c("panCK", "CD45", "HH3")`, we specify a tissue mask which simpleSeg uses, filtering out all background noise outside the tissue mask. This is important as these are tumour cores, wand hence circular, so we'd want to ignore background noise which happens outside of the tumour core.
There are many other parameters that can be specified in simpleSeg (`smooth`, `watershed`, `tolerance`, and `ext`), and we encourage the user to select the best parameters which suit their biological context.
```{r}
masks <- simpleSeg(images,
nucleus = c("HH3"),
pca = TRUE,
cellBody = "dilate",
discSize = 3,
sizeSelection = 20,
transform = "sqrt",
tissue = c("panCK", "CD45", "HH3"),
cores = nCores
)
```
### Visualise separation
The `display` and `colorLabels` functions in `EBImage` make it very easy to examine the performance of the cell segmentation. The great thing about `display` is that if used in an interactive session it is very easy to zoom in and out of the image.
```{r visualise segmentation}
EBImage::display(colorLabels(masks[[1]]))
```
### Visualise outlines
The `plotPixels` function in `cytomapper` makes it easy to overlay the mask on top of the nucleus intensity marker to see how well our segmentation process has performed. Here we can see that the segmentation appears to be performing reasonably.
If you see over or under-segmentation of your images, `discSize` is a key parameter in `simpleSeg()` for optimising the size of the dilation disc after segmenting out the nuclei.
```{r}
plotPixels(image = images["F3"],
mask = masks["F3"],
img_id = "imageID",
colour_by = c("HH3"),
display = "single",
colour = list(HH3 = c("black","blue")),
legend = NULL,
bcg = list(
HH3 = c(1, 1, 2)
))
```
If you wish to visualise multiple markers instead of just the HH3 marker and see how the segmentation mask performs, this can also be done. Here, we can see that our segmentation mask has done a good job of capturing the CD31 signal, but perhaps not such a good job of capturing the FXIIIA signal, which often lies outside of our dilated nuclear mask. This could suggest that we might need to increase the `discSize` of our dilation.
```{r}
plotPixels(image = images["F3"],
mask = masks["F3"],
img_id = "imageID",
colour_by = c("HH3", "CD31", "FX111A"),
display = "single",
colour = list(HH3 = c("black","blue"),
CD31 = c("black", "red"),
FX111A = c("black", "green") ),
legend = NULL,
bcg = list(
HH3 = c(1, 1, 2),
CD31 = c(0, 1, 2),
FX111A = c(0, 1, 1.5)
))
```
## Summarise cell features.
In order to characterise the phenotypes of each of the segmented cells, `measureObjects()` from `cytomapper` will calculate the average intensity of each channel within each cell as well as a few morphological features. By default, the `measureObjects()` function will return a `SingleCellExperiment` object, where the channel intensities are stored in the `counts` assay and the spatial location of each cell is stored in `colData` in the `m.cx` and `m.cy` columns.
However, you can also specify `measureObjects()` to return a `SpatialExperiment` object by specifying `return_as = "spe"`. As a `SpatialExperiment` object, the spatial location of each cell is stored in the `spatialCoords` slot, as `m.cx` and `m.cy`, which simplifies plotting. In this demonstration, we will return a `SpatialExperiment` object.
```{r}
# Summarise the expression of each marker in each cell
cells <- cytomapper::measureObjects(masks,
images,
img_id = "imageID",
return_as = "spe",
BPPARAM = BPPARAM)
spatialCoordsNames(cells) <- c("x", "y")
```
## Load the clinical data
To associate features in our image with disease progression, it is important to read in information which links image identifiers to their progression status. We will do this here, making sure that our `imageID` match.
### Read the clinical data
```{r}
clinical <- read.csv(
system.file(
"extdata/clinicalData_TMA1_2021_AF.csv",
package = "spicyWorkflow"
)
)
rownames(clinical) <- clinical$imageID
clinical <- clinical[names(images), ]
```
### Put the clinical data into the colData of SingleCellExperiment
```{r}
colData(cells) <- cbind(colData(cells), clinical[cells$imageID, ])
```
```{r, eval=FALSE}
save(cells, file = "spe_Ferguson_2022.rda")
```
In case you already have your SCE object, you may only be interested in our downstream workflow. For the sake of convenience, we've provided capability to directly load in the SpatialExperiment (SPE) object that we've generated.
```{r, eval=FALSE}
load(system.file("extdata/cells.rda", package = "spicyWorkflow"))
```
## Normalise the data
We should check to see if the marker intensities of each cell require some form of transformation or normalisation. The reason we do this is two-fold:\
1) The intensities of images are often highly skewed, preventing any meaningful downstream analysis.\
2) The intensities across different images are often different, meaning that what is considered "positive" can be different across images.
By transforming and normalising the data, we aim to reduce these two effects. Here we extract the intensities from the `counts` assay. Looking at CD3 which should be expressed in the majority of the T cells, the intensities are clearly very skewed, and it is difficult to see what is considered a CD3- cell, and what is a CD3+ cell.
```{r, fig.width=5, fig.height=5}
# Plot densities of CD3 for each image.
cells |>
join_features(features = rownames(cells), shape = "wide", assay = "counts") |>
ggplot(aes(x = CD3, colour = imageID)) +
geom_density() +
theme(legend.position = "none")
```
### Dimension reduction and visualisation
As our data is stored in a `SpatialExperiment` we can also use `scater` to perform and visualise our data in a lower dimension to look for batch effects in our images. We can see that before normalisation, our UMAP shows a clear batch effect between images.
```{r}
# Usually we specify a subset of the original markers which are informative to separating out distinct cell types for the UMAP and clustering.
ct_markers <- c("podoplanin", "CD13", "CD31",
"panCK", "CD3", "CD4", "CD8a",
"CD20", "CD68", "CD16", "CD14", "HLADR", "CD66a")
# ct_markers <- c("podoplanin", "CD13", "CD31",
# "panCK", "CD3", "CD4", "CD8a",
# "CD20", "CD68", "CD14", "CD16",
# "CD66a")
set.seed(51773)
# Perform dimension reduction using UMAP.
cells <- scater::runUMAP(
cells,
subset_row = ct_markers,
exprs_values = "counts"
)
# Select a subset of images to plot.
someImages <- unique(cells$imageID)[c(1, 5, 10, 20, 30, 40)]
# UMAP by imageID.
scater::plotReducedDim(
cells[, cells$imageID %in% someImages],
dimred = "UMAP",
colour_by = "imageID"
)
```
We can transform and normalise our data using the `normalizeCells` function. In the `normalizeCells()` function, we specify the following parameters. `transformation` is an optional argument which specifies the function to be applied to the data. We do not apply an arcsinh transformation here, as we already apply a square root transform in the `simpleSeg()` function. `method = c("trim99", "mean", PC1")` is an optional argument which specifies the normalisation method/s to be performed. Here, we: 1) Trim the 99th percentile 2) Divide by the mean 3) Remove the 1st principal component `assayIn = "counts"` is a required argument which specifies what the assay you'll be taking the intensity data from is named. In our context, this is called `counts`.
This modified data is then stored in the `norm` assay by default. We can see that this normalised data appears more bimodal, not perfectly, but likely to a sufficient degree for clustering, as we can at least observe a clear CD3+ peak at 1.00, and a CD3- peak at around 0.3.
```{r, fig.width=5, fig.height=5}
# Leave out the nuclei markers from our normalisation process.
useMarkers <- rownames(cells)[!rownames(cells) %in% c("DNA1", "DNA2", "HH3")]
# Transform and normalise the marker expression of each cell type.
cells <- normalizeCells(cells,
markers = useMarkers,
transformation = NULL,
method = c("trim99", "mean", "PC1"),
assayIn = "counts",
cores = nCores
)
# Plot densities of CD3 for each image
cells |>
join_features(features = rownames(cells), shape = "wide", assay = "norm") |>
ggplot(aes(x = CD3, colour = imageID)) +
geom_density() +
theme(legend.position = "none")
```
We can also appreciate through the UMAP a reduction of the batch effect we initially saw.
```{r}
set.seed(51773)
# Perform dimension reduction using UMAP.
cells <- scater::runUMAP(
cells,
subset_row = ct_markers,
exprs_values = "norm",
name = "normUMAP"
)
someImages <- unique(cells$imageID)[c(1, 5, 10, 20, 30, 40)]
# UMAP by imageID.
scater::plotReducedDim(
cells[, cells$imageID %in% someImages],
dimred = "normUMAP",
colour_by = "imageID"
)
```
## FuseSOM: Cluster cells into cell types
We can appreciate from the UMAP that there is a division of clusters, most likely representing different cell types. We next aim to empirically distinguish each cluster using our FuseSOM package for clustering.
Our FuseSOM R package can be found on bioconductor at <https://www.bioconductor.org/packages/release/bioc/html/FuseSOM.html>, and provides a pipeline for the clustering of highly multiplexed in situ imaging cytometry assays. This pipeline uses the Self Organising Map architecture coupled with Multiview hierarchical clustering and provides functions for the estimation of the number of clusters.
Here we cluster using the `runFuseSOM` function. We specify the number of clusters to identify to be `numClusters = 10`. We also specify a set of cell-type specific markers to use, as we want our clusters to be distinct based off cell type markers, rather than markers which might pick up a transitioning cell state.
### Perform the clustering
```{r FuseSOM}
# Set seed.
set.seed(51773)
# Generate SOM and cluster cells into 10 groups
cells <- runFuseSOM(
cells,
markers = ct_markers,
assay = "norm",
numClusters = 10
)
```
We can also observe how reasonable our choice of `k = 10` was, using the `estimateNumCluster()` and `optiPlot()` functions. Here we examine the Gap method, but others such as Silhouette and Within Cluster Distance are also available. We can see that there are elbow points in the gap statistic at `k = 7`, `k = 10`, and `k = 11`. We've specified `k = 10`, striking a good balance between the number of clusters and the gap statistic.
```{r}
cells <- estimateNumCluster(cells, kSeq = 2:30)
optiPlot(cells, method = "gap")
```
### Attempt to interpret the phenotype of each cluster
We can begin the process of understanding what each of these cell clusters are by using the `plotGroupedHeatmap` function from `scater`. At the least, here we can see we capture all the major immune populations that we expect to see, including the CD4 and CD8 T cells, the CD20+ B cells, the CD68+ myeloid populations, the CD66+ granulocytes, the podoplanin+ epithelial cells, and the panCK+ tumour cells.
```{r}
# Visualise marker expression in each cluster.
scater::plotGroupedHeatmap(
cells,
features = ct_markers,
group = "clusters",
exprs_values = "norm",
center = TRUE,
scale = TRUE,
zlim = c(-3, 3),
cluster_rows = FALSE,
block = "clusters"
)
```
Given domain-specific knowledge of the tumour-immune landscape, we can go ahead and annotate these clusters as cell types given their expression profiles.
```{r}
cells <- cells |>
mutate(cellType = case_when(
clusters == "cluster_1" ~ "GC", # Granulocytes
clusters == "cluster_2" ~ "MC", # Myeloid cells
clusters == "cluster_3" ~ "SC", # Squamous cells
clusters == "cluster_4" ~ "EP", # Epithelial cells
clusters == "cluster_5" ~ "SC", # Squamous cells
clusters == "cluster_6" ~ "TC_CD4", # CD4 T cells
clusters == "cluster_7" ~ "BC", # B cells
clusters == "cluster_8" ~ "EC", # Endothelial cells
clusters == "cluster_9" ~ "TC_CD8", # CD8 T cells
clusters == "cluster_10" ~ "DC" # Dendritic cells
))
```
We might also be interested in how these cell types are distributed on the images themselves. Here we examine the distribution of clusters on image F3, noting the healthy epithelial and endothelial structures surrounded by tumour cells.
```{r}
reducedDim(cells, "spatialCoords") <- spatialCoords(cells)
cells |>
filter(imageID == "F3") |>
plotReducedDim("spatialCoords", colour_by = "cellType")
```
### Check cell type frequencies
We find it always useful to check the number of cells in each cluster. Here we can see that cluster 10 contains lots of (most likely tumour - high expression of panCK and non-consistent expression of other markers) cells and cluster 4 contains very few cells.
```{r}
# Check cell type frequencies.
cells$cellType |>
table() |>
sort()
```
We can also use the UMAP we computed earlier to visualise our data in a lower dimension and see how well our annotated cell types cluster out.
```{r}
# UMAP by cell type
scater::plotReducedDim(
cells[, cells$imageID %in% someImages],
dimred = "normUMAP",
colour_by = "cellType"
)
```
### Testing for association between the proportion of each cell type and progressor status
We recommend using a package such as `diffcyt` for testing for changes in abundance of cell types. However, the `colTest` function allows us to quickly test for associations between the proportions of the cell types and progression status using either Wilcoxon rank sum tests or t-tests. Here we see a p-value less than 0.05, but this does not equate to a small FDR.
```{r}
# Perform simple student's t-tests on the columns of the proportion matrix.
testProp <- colTest(cells,
condition = "group",
feature = "cellType",
type = "ttest")
head(testProp)
```
Let's examine one of these clusters using our `getProp()` function from `spicyR`, which conveniently transforms our proportions into a feature matrix of images by cell type, enabling convenient downstream classification or analysis.
Next, let's visualise how different the proportions are
boxplot.
```{r}
prop <- getProp(cells, feature = "cellType")
prop[1:5, 1:5]
```
It appears that the CD8 T cells are the most differentially abundant cell type across our progressors and non-progressors. A boxplot visualisation of CD8 T cell proportion clearly shows that progressors have a lower proportion of CD8 T cells in the tumour core.
```{r}
clusterToUse <- rownames(testProp)[1]
prop |>
select(all_of(clusterToUse)) |>
tibble::rownames_to_column("imageID") |>
left_join(clinical, by = "imageID") |>
ggplot(aes(x = group, y = .data[[clusterToUse]], fill = group)) +
geom_boxplot()
```
**NB**: If you have already clustered and annotated your cells, you may only be interested in our downstream analysis capabilities, looking into identifying localisation (spicyR), cell regions (lisaClust), and cell-cell interactions (SpatioMark & Kontextual). Therefore, for the sake of convenience, we've provided capability to directly load in the SpatialExperiment (SPE) object that we've generated up to this point, complete with clusters and normalised intensities.
```{r, eval=FALSE}
load(system.file("extdata/computed_cells.rda", package = "spicyWorkflow"))
```
## spicyR: Test spatial relationships
Our spicyR package is available on bioconductor on <https://www.bioconductor.org/packages/devel/bioc/html/spicyR.html> and provides a series of functions to aid in the analysis of both immunofluorescence and imaging mass cytometry data as well as other assays that can deeply phenotype individual cells and their spatial location. Here we use the `spicy()` function to test for changes in the spatial relationships between pair-wise combinations of cells.
Put simply, spicyR uses the L-function to determine localisation or dispersion between cell types. The L-function is an arbitrary measure of "closeness" between points, with greater values suggesting increased localisation, and lower values suggesting dispersion.
Here, we quantify spatial relationships using a combination of 10 radii from 10 to 100 by specifying `Rs = 1:10*10` and mildly account for some global tissue structure using `sigma = 50`. Further information on how to optimise these parameters can be found in the [vignette](https://bioconductor.org/packages/release/bioc/vignettes/spicyR/inst/doc/spicyR.html) and the spicyR [paper](https://doi.org/10.1093/bioinformatics/btac268).
```{r}
spicyTest <- spicy(cells,
condition = "group",
cellTypeCol = "cellType",
imageIDCol = "imageID",
Rs = 1:10*10,
sigma = 50,
BPPARAM = BPPARAM)
topPairs(spicyTest, n = 10)
```
We can visualise these tests using `signifPlot` where we observe that cell type pairs appear to become less attractive (or avoid more) in the progression sample.
```{r}
# Visualise which relationships are changing the most.
signifPlot(
spicyTest,
breaks = c(-1.5, 1.5, 0.5)
)
```
`spicyR` also has functionality for plotting out individual pairwise relationships. We can first try look into whether the `SC` tumour cell type localises with the `GC` granular cell type, and whether this localisation affects progression vs non-progression of the tumour.
```{r}
spicyBoxPlot(spicyTest,
from = "SC",
to = "GC")
```
Alternatively, we can look at the most differentially localised relationship between progressors and non-progressors by specifying `rank = 1`.
```{r}
spicyBoxPlot(spicyTest,
rank = 1)
```
## lisaClust: Find cellular neighbourhoods
Our lisaClust package (https://www.bioconductor.org/packages/devel/bioc/html/lisaClust.html)\[https://www.bioconductor.org/packages/devel/bioc/html/lisaClust.html\] provides a series of functions to identify and visualise regions of tissue where spatial associations between cell-types is similar. This package can be used to provide a high-level summary of cell-type co-localisation in multiplexed imaging data that has been segmented at a single-cell resolution. Here we use the `lisaClust` function to clusters cells into 5 regions with distinct spatial ordering.
```{r}
set.seed(51773)
# Cluster cells into spatial regions with similar composition.
cells <- lisaClust(
cells,
k = 4,
sigma = 50,
cellType = "cellType",
BPPARAM = BPPARAM
)
```
### Region - cell type enrichment heatmap
We can try to interpret which spatial orderings the regions are quantifying using the `regionMap` function. This plots the frequency of each cell type in a region relative to what you would expect by chance. We can see here that our regions have neatly separated according to biological milieu, with region 1 and 2 representing our immune cell regions, region 3 representing our tumour cells, and region 4 representing our healthy epithelial and endothelial cells.
```{r, fig.height=5, fig.width=5}
# Visualise the enrichment of each cell type in each region
regionMap(cells, cellType = "cellType", limit = c(0.2, 2))
```
### Visualise regions
By default, these identified regions are stored in the `regions` column in the `colData` of our object. We can quickly examine the spatial arrangement of these regions using `ggplot` on image F3, where we can see the same division of immune, healthy, and tumour tissue that we identified in our `regionMap`.
```{r, message=FALSE, warning=FALSE}
cells |>
filter(imageID == "F3") |>
plotReducedDim("spatialCoords", colour_by = "region")
```
While much slower, we have also implemented a function for overlaying the region information as a hatching pattern so that the information can be viewed simultaneously with the cell type calls.
```{r}
# Use hatching to visualise regions and cell types.
hatchingPlot(
cells,
useImages = "F3",
cellType = "cellType",
nbp = 300
)
```
### Test for association with progression
Similar to cell type proportions, we can quickly use the `colTest` function to test for associations between the proportions of cells in each region and progression status by specifying `feature = "region"`.
```{r}
# Test if the proportion of each region is associated
# with progression status.
testRegion <- colTest(
cells,
feature = "region",
condition = "group",
type = "ttest"
)
testRegion
```
## Statial: Identify changes in cell state.
Our Statial package (https://www.bioconductor.org/packages/release/bioc/html/Statial.html) provides a suite of functions (Kontextual) for robust quantification of cell type localisation which are invariant to changes in tissue structure. In addition, we provide a suite of functions (SpatioMark) for uncovering continuous changes in marker expression associated with varying levels of localisation.
### SpatioMark: Continuous changes in marker expression associated with varying levels of localisation.
The first step in analysing these changes is to calculate the spatial proximity (`getDistances`) of each cell to every cell type. These values will then be stored in the `reducedDims` slot of the `SingleCellExperiment` object under the names `distances`. SpatioMark also provides functionality to look into proximal cell abundance using the `getAbundance()` function, which is further explored within the `Statial` package vignette.
```{r}
cells$m.cx <- spatialCoords(cells)[,"x"]
cells$m.cy <- spatialCoords(cells)[,"y"]
cells <- getDistances(cells,
maxDist = 200,
nCores = nCores,
cellType = "cellType",
spatialCoords = c("m.cx", "m.cy")
)
```
We can then visualise an example image, specified with `image = "F3"` and a particular marker interaction with cell type localisation. To visualise these changes, we specify two cell types with the `from` and `to` parameters, and a marker with the `marker` parameter (cell-cell-marker interactions). Here, we specify the changes in the marker podoplanin in `SC` tumour cells as its localisation to `EP` epithelial cells increases or decreases, where we can observe that podoplanin decreases in tumour cells as its distance to the central cluster of epithelial cells increases.
```{r}
p <- plotStateChanges(
cells = cells,
cellType = "cellType",
spatialCoords = c("m.cx", "m.cy"),
type = "distances",
image = "F3",
from = "SC",
to = "EP",
marker = "podoplanin",
size = 1,
shape = 19,
interactive = FALSE,
plotModelFit = FALSE,
method = "lm"
)
p
```
SpatioMark aims to holistically uncover all such significant relationships by looking at all interactions across all images. The `calcStateChanges` function provided by Statial can be expanded for this exact purpose - by not specifying cell types, a marker, or an image, `calcStateChanges` will examine the most significant correlations between distance and marker expression across the entire dataset.
```{r}
state_dist <- calcStateChanges(
cells = cells,
cellType = "cellType",
type = "distances",
assay = 2,
nCores = nCores,
minCells = 100
)
head(state_dist[state_dist$imageID == "F3",], n = 10)
```
The results from our SpatioMark outputs can be converted from a `data.frame` to a `matrix`, using the `prepMatrix()` function. Note, the choice of extracting either the t-statistic or the coefficient of the linear regression can be specified using the `column = "tval"` parameter, with the coefficient being the default extracted parameter. We can see that with SpatioMark, we get some features which are significant after adjusting for FDR.
```{r}
# Preparing outcome vector
outcome <- cells$group[!duplicated(cells$imageID)]
names(outcome) <- cells$imageID[!duplicated(cells$imageID)]
# Preparing features for Statial
distMat <- prepMatrix(state_dist)
distMat <- distMat[names(outcome), ]
# Remove some very small values
distMat <- distMat[, colMeans(abs(distMat) > 0.0001) > .8]
survivalResults <- colTest(distMat, outcome, type = "ttest")
head(survivalResults)
```
### Kontextual: Robust quantification of cell type localisation which is invariant to changes in tissue structure
`Kontextual` is a method to evaluate the localisation relationship between two cell types in an image. `Kontextual` builds on the L-function by contextualising the relationship between two cell types in reference to the typical spatial behaviour of a $3^{rd}$ cell type/population. By taking this approach, `Kontextual` is invariant to changes in the window of the image as well as tissue structures which may be present.
The definitions of cell types and cell states are somewhat ambiguous, cell types imply well defined groups of cells that serve different roles from one another, on the other hand cell states imply that cells are a dynamic entity which cannot be discretised, and thus exist in a continuum. For the purposes of using `Kontextual` we treat cell states as identified clusters of cells, where larger clusters represent a "parent" cell population, and finer sub-clusters representing a "child" cell population. For example a CD4 T cell may be considered a child to a larger parent population of Immune cells. `Kontextual` thus aims to see how a child population of cells deviate from the spatial behaviour of their parent population, and how that influences the localisation between the child cell state and another cell state.
#### Cell type hierarchy
A key input for Kontextual is an annotation of cell type hierarchies. We will need these to organise all the cells present into cell state populations or clusters, e.g. all the different B cell types are put in a vector called bcells.
Here, we use the `treeKor` bioconductor package [treekoR](http://www.bioconductor.org/packages/release/bioc/html/treekoR.html) to define these hierarchies in a data driven way.
```{r}
fergusonTree <- treekoR::getClusterTree(t(assay(cells, "norm")),
cells$cellType,
hierarchy_method="hopach")
parent1 <- c("TC_CD8", "TC_CD4", "DC")
parent2 <- c("BC", "GC")
parent3 <- c(parent1, parent2)
parent4 <- c("MC", "EP", "SC")
parent5 <- c(parent4, "EC")
all = c(parent1, parent2, parent3, parent4, parent5)
treeDf = Statial::parentCombinations(all, parent1, parent2, parent3, parent4, parent5)
fergusonTree$clust_tree |> plot()
```
`Kontextual` accepts a `SingleCellExperiment` object, a single image, or list of images from a `SingleCellExperiment` object, which gets passed into the `cells` argument. Here, we've specified Kontextual to perform calculations on all pairwise combinations for every cluster using the `parentCombinations()` function to create the `treeDf` dataframe which we've specified in the `parentDf` parameter. The argument `r` will specify the radius which the cell relationship will be evaluated on. `Kontextual` supports parallel processing, the number of cores can be specified using the `cores` argument. `Kontextual` can take a single value or multiple values for each argument and will test all combinations of the arguments specified.
We can calculate all pairwise relationships across all images for a single radius.
```{r}
kontext <- Kontextual(
cells = cells,
cellType = "cellType",
spatialCoords = c("m.cx", "m.cy"),
parentDf = treeDf,
r = 50,
cores = nCores
)
```
Again, we can use the same `colTest()` to quickly test for associations between the Kontextual values and progression status using either Wilcoxon rank sum tests or t-tests. Similar to SpatioMark, we can specify using either the original L-function by specifying `column = "original"` in our `prepMatrix()` function.
```{r}
# Converting Kontextual result into data matrix
kontextMat <- prepMatrix(kontext)
# Replace NAs with 0
kontextMat[is.na(kontextMat)] <- 0
survivalResults <- spicyR::colTest(kontextMat, outcome, type = "ttest")
head(survivalResults)
```
## ClassifyR: Classification
Our ClassifyR package, <https://github.com/SydneyBioX/ClassifyR>, formalises a convenient framework for evaluating classification in R. We provide functionality to easily include four key modelling stages; Data transformation, feature selection, classifier training and prediction; into a cross-validation loop. Here we use the `crossValidate` function to perform 100 repeats of 5-fold cross-validation to evaluate the performance of a random forest applied to five quantifications of our IMC data; 1) Cell type proportions 2) Cell type localisation from `spicyR` 3) Region proportions from `lisaClust` 4) Cell type localisation in reference to a parent cell type from `Kontextual` 5) Cell changes in response to proximal changes from `SpatioMark`
```{r}
# Create list to store data.frames
data <- list()
# Add proportions of each cell type in each image
data[["Proportions"]] <- getProp(cells, "cellType")
# Add pair-wise associations
spicyMat <- bind(spicyTest)
spicyMat[is.na(spicyMat)] <- 0
spicyMat <- spicyMat |>
select(!condition) |>
tibble::column_to_rownames("imageID")
data[["SpicyR"]] <- spicyMat
# Add proportions of each region in each image
# to the list of dataframes.
data[["LisaClust"]] <- getProp(cells, "region")
# Add SpatioMark features
data[["SpatioMark"]] <- distMat
# Add Kontextual features
data[["Kontextual"]] <- kontextMat
```
```{r}
# Set seed
set.seed(51773)
# Perform cross-validation of a random forest model
# with 100 repeats of 5-fold cross-validation.
cv <- crossValidate(
measurements = data,
outcome = outcome,
classifier = "randomForest",
nFolds = 5,
nRepeats = 50,
nCores = nCores
)
```
### Visualise cross-validated prediction performance
Here we use the `performancePlot` function to assess the AUC from each repeat of the 5-fold cross-validation. We see that the lisaClust regions appear to capture information which is predictive of progression status of the patients.
```{r}
# Calculate AUC for each cross-validation repeat and plot.
performancePlot(
cv,
metric = "AUC",
characteristicsList = list(x = "Assay Name"),
orderingList = list("Assay Name" = c("Proportions", "SpicyR", "LisaClust", "Kontextual", "SpatioMark"))
)
```
We can also visualise which features were good at classifying which patients using the `sampleMetricMap()` function from `ClassifyR`.
```{r}
samplesMetricMap(cv)
```
## Summary
Here we have used a pipeline of our spatial analysis R packages to demonstrate an easy way to segment, cluster, normalise, quantify and classify high dimensional in situ cytometry data all within R.
## sessionInfo
```{r}
sessionInfo()
```