Skip to content

Commit

Permalink
Merge branch 'main' into allyhawkins/project-downloads
Browse files Browse the repository at this point in the history
  • Loading branch information
allyhawkins authored Mar 1, 2024
2 parents e0ea5d9 + 713551a commit 31a0d7a
Showing 1 changed file with 13 additions and 23 deletions.
36 changes: 13 additions & 23 deletions content/03.results.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,7 @@ Any cells removed after filtering empty droplets based on the unfiltered RNA cou
The workflow calculates QC statistics for ADT counts using `DropletUtils::cleanTagCounts()` that are stored alongside the ADT by cell counts matrix in the filtered `SingleCellExperiment` object.
The `SingleCellExperiment` object containing the filtered RNA and ADT counts matrix and associated ADT QC statistics is saved to an `.rds` file with the suffix `_filtered.rds`.

Like the RNA gene expression data, the ADT by cell counts matrix is normalized.
<!-- TODO: Cite/briefly explain ADT normalization -->
<!-- TODO: check if the meaning was retained in next sentence? -->
The ADT by cell counts matrix is normalized by first determining the ambient profile and then using that profile to calculate median size factors with `scuttle::computeMedianFactors()` [@doi:10.18129/B9.bioc.scuttle; @url:https://bioconductor.org/books/3.16/OSCA.advanced/integrating-with-protein-abundance.html#cite-seq-median-norm].
We skip normalization for cells with low-quality ADT expression, as indicated by `DropletUtils::cleanTagCounts()`.
Although `scpca-nf` normalizes ADT counts, the workflow does not perform any dimensionality reduction of ADT data; only the RNA counts data is used as input for dimensionality reduction.
The normalized ADT data is saved as an `altExp` within the processed `SingleCellExperiment` containing the normalized RNA data and is output to a `.rds` file with the suffix `_processed.rds`.
Expand All @@ -108,38 +106,30 @@ The top 4 ADTs with the most variable expression are also identified and visuali

### Multiplexed libraries

To process multiplexed libraries, the HTO FASTQ is input to `scpca-nf` and quantified using `salmon alevin` and `alevin-fry` (Supplemental Figure 2C).
Along with the FASTQ files, `scpca-nf` requires two TSV files for processing multiplexed data.
The first is similar to the barcode file required when quantifying ADT expression and contains the HTO name and associated barcode.
This file is needed to build an HTO-specific index for quantifying HTO expression with `alevin-fry`.
The second tsv file contains one row for each sample included in the multiplexed library and tells the workflow which HTO was used for which sample when multiplexing the library.
The output from `alevin-fry` is the HTO by cell counts matrix.
The HTO by cell counts matrix is read into R alongside the gene by cell counts matrix.
The unfiltered HTO by cell matrix is saved as an alternative experiment (`altExp`) within the main `SingleCellExperiment` containing the unfiltered RNA counts.
To process multiplexed libraries, the HTO FASTQ files are input to `scpca-nf` and quantified using `salmon alevin` and `alevin-fry` (Supplemental Figure 2C).
Along with the FASTQ files, `scpca-nf` requires two TSV files to process multiplexed data: one to build an HTO-specific index for quantifying HTO expression with `alevin-fry` and a second indicating which HTO was used for which sample when multiplexing the library.
The unfiltered HTO by cell counts matrix output from `alevin-fry` is saved as an alternative experiment (`altExp`) within the main `SingleCellExperiment` containing the unfiltered RNA counts.
This `SingleCellExperiment` object containing both RNA and HTO counts is output from the workflow to a `.rds` file with the suffix `_unfiltered.rds`.

As with ADT data, `scpca-nf` does not filter any cells based on HTO expression.
`DropletUtils::emptyDropsCellRanger()` is only applied to the unfiltered RNA counts matrix to remove empty droplets, and any cells removed after filtering empty droplets are also removed from the HTO counts matrix and saved to an `.rds` file with the `_filtered.rds` suffix.
As with ADT data, `scpca-nf` does not filter any cells based on HTO expression, and any cells removed after filtering empty droplets based on the unfiltered RNA counts matrix are also removed from the HTO counts matrix and saved to an `.rds` file with the `_filtered.rds` suffix.
`scpca-nf` does not perform any additional filtering or processing of the HTO by cell counts matrix, so the same filtered matrix is saved to the processed `.rds` file with the `_processed.rds` suffix.

Although `scpca-nf` quantifies the HTO data and includes an HTO by cell counts matrix in all objects, `scpca-nf` does not demultiplex the samples into one sample per library.
Instead, `scpca-nf` applies multiple demultiplexing methods, including demultiplexing with `DropletUtils::hashedDrops()`, demultiplexing with `Seurat::HTODemux()`, and genetic demultiplexing, if possible.
The genetic demultiplexing used in `scpca-nf` uses the method described in Weber et al [@doi:10.1093/gigascience/giab062], which takes bulk RNA-seq data and single-cell RNA-seq data from the same sample.
The bulk RNA-seq serves as a reference for the expected genotypes found in each sample.
If a sample lacks associated bulk RNA-seq data, then no genetic demultiplexing is performed.
The results from these three demultiplexing methods are saved in the filtered and processed `SingleCellExperiment` objects.
Instead, `scpca-nf` applies multiple demultiplexing methods, including demultiplexing with `DropletUtils::hashedDrops()` [@doi:10.18129/B9.bioc.DropletUtils], demultiplexing with `Seurat::HTODemux()` [@doi:10.1186/s13059-018-1603-1], and genetic demultiplexing when bulk RNA-seq data is available.
`scpca-nf` uses the genetic demultiplexing method described in Weber et al. [@doi:10.1093/gigascience/giab062], which uses bulk RNA-seq as a reference for the expected genotypes found in each single-cell RNA-seq sample.
The results from all available demultiplexing methods are saved in the filtered and processed `SingleCellExperiment` objects.

If a library has associated HTO data, an additional section is included in the QC report output by `scpca-nf`.
This section summarizes HTO-specific library statistics, such as how many cells express each HTO.
No additional plots are produced, but a table summarizing the results from all three demultiplexing methods is included.

### Bulk and spatial transcriptomics

For some samples, multiple libraries were collected, with the additional libraries being used for bulk RNA-seq and/or spatial transcriptomics.
Both of these additional sequencing methods are supported by `scpca-nf`.
`scpca-nf` takes FASTQ from bulk RNA-seq as input, trims reads using `fastp`, and then aligns reads with `salmon` (Supplemental Figure 3A).
The output is a single TSV file with the sample by gene counts matrix for all samples in a given ScPCA project.
This sample by gene matrix is included only with project downloads on the Portal.
Multiple libraries were collected for some samples, with the additional libraries being used for bulk RNA-seq and/or spatial transcriptomics.
Both of these additional sequencing methods are supported by `scpca-nf`.
`scpca-nf` takes FASTQ from bulk RNA-seq as input, trims reads using `fastp` [@doi:10.1093/bioinformatics/bty560], and then aligns reads with `salmon` (Supplemental Figure 3A) [@doi:10.1038/nmeth.4197].
The output is a single TSV file with the gene by sample counts matrix for all samples in a given ScPCA project.
This gene by sample matrix is only included with project downloads on the Portal.

To quantify spatial transcriptomics data, `scpca-nf` takes the RNA FASTQ and slide image as input (Supplemental Figure 3B).
As there is not yet support for spatial transcriptomics with `alevin-fry`, `scpca-nf` uses Space Ranger to quantify all spatial transcriptomics data [@url:https://www.10xgenomics.com/support/software/space-ranger/latest].
Expand Down

0 comments on commit 31a0d7a

Please sign in to comment.