diff --git a/content/03.results.md b/content/03.results.md index 790b847..ca0886b 100644 --- a/content/03.results.md +++ b/content/03.results.md @@ -135,31 +135,47 @@ To quantify spatial transcriptomics data, `scpca-nf` takes the RNA FASTQ and sli As there is not yet support for spatial transcriptomics with `alevin-fry`, `scpca-nf` uses Space Ranger to quantify all spatial transcriptomics data [@url:https://www.10xgenomics.com/support/software/space-ranger/latest]. The output includes the spot by gene matrix along with a summary report produced by Space Ranger. - ## Downloading projects from the ScPCA Portal -1. Users can download all samples for a given project together - - The portal has two different options to allow users to download data for all samples in a given ScPCA Project, either as invididual files for each sample or as a single merged file. - - By default, when downloading a project, the download will include a folder for each sample that is included in the project. - - That folder will contain all individual `SingleCellExperiment` objects as `.rds` files or `AnnData` objects as `.hdf5` files, depending on the file format chosen by the user (Fig. 3A). - - Each of these objects contains the gene expression data and metadata for a single library. - - If a given project has associated bulk RNA-seq, then a sample by gene counts matrix, `bulk_quant.tsv`, including the quantified gene expression data for all samples in a project with associated bulk RNA-seq will be included. -2. Merged objects - - Providing all data from all libraries withing a single file makes it easier for users to perform joint analysis on multiple samples at the same time. - Specifically, these objects can be useful for comparing gene-level metrics across multiple samples, such as differential expression analysis and gene set enrichment analysis. - - Therefore, we make a single, merged `SingleCellExperiment` or `AnnData` object (Fig. 3B) available for each project (without batch-correction or integration). - - This file contains one object with all raw and normalized gene expression data and metadata for all single-cell and single-nuclei RNA-seq libraries within a given ScPCA project - - If downloading a project that contains at least one library with CITE-seq, the quantified CITE-seq expression data will also be merged. In SCEs this is provided as an `altExp` within the main object, but for `AnnData` objects, the quantified CITE-seq data is provided as a separate file. - -2. The merged object workflow (Fig. 3C and 3D) - - To create the merged objects, we created an additional stand-alone workflow for merging the output from `scpca-nf`, `merge.nf` (Fig. 3C). - - Following processing of each `SingleCellExperiment` object with `scpca-nf`, all processed objects from all libraries and samples within a project are input to the merge workflow, which combines all input data into a single merged object. - - The merged object contains raw and normalized gene expression counts for all cells in all libraries. The same index was used for processing all individual libraries, so the genes found will be the same as in an invididual object. - - After merging, the top 2000 high-variance genes are calculated by modeling variance within each library included in the merged object. - - These high-variance genes are used to calculate new PCA coordinates using `batchelor::multiBatchPCA()` and specifying librares as batches. - - The top 50 PCs were selected and used as input to calculate new UMAP embeddings on the merged object. - - Similar to `scpca-nf`, the merged `SingleCellExperiment` object is converted to a merged `AnnData` object and both formats are provided as download options on the Portal. - - Along with the merged objects, for each project, a merged summary report is created and output. - - This report includes a brief summary of the samples and libraries included in the merged object, including a summary of the type of libraries (e.g., single-cell, single-nuclei, with CITE-seq) and sample diagnoses included in the object. - - The report also contains a UMAP showing all cells from all libraries included in the merged object. For each library, a separate panel is shown, and cells from that library are colored while all other cells are gray (Fig. 3D). + +On the Portal, users can select to download data from individual samples or all data from an entire ScPCA project. +When downloading data for an entire project, users can choose between receiving the individual files for each sample (default) or one file containing the gene expression data and metadata for all samples in the project. +Users also have the option to choose their desired format and receive the data as `SingleCellExperiment` (`.rds`) or `AnnData` (`.hdf5`) objects. + +For downloads with samples as individual files, the download folder will include a sub-folder for each sample in the project (Figure 3A). +Each sample folder contains all three object types (unfiltered, filtered, and processed) as either `SingleCellExperiment` (`.rds`) or `AnnData` (`.hdf5`) objects and the QC report for all libraries from the given sample. +The objects house the summarized gene expression data and associated metadata for the library indicated in the filename. + +All project downloads include a metadata file, `single_cell_metadata.tsv`, containing relevant metadata for all samples, and a `README.md` with information about the contents of each download, contact and citation information, and terms of use for data downloaded from the Portal (Figure 3A-B). +If the ScPCA project includes samples with bulk RNA-seq, two additional files are included: a gene by sample counts matrix (`bulk_quant.tsv`) with the quantified gene expression data for all samples in the project and a metadata file (`bulk_metadata.tsv`). + +### Merged objects + +Providing data for all libraries within a single file makes it easier for users to perform joint gene-level analyses, such as differential expression or gene set enrichment analyses, on multiple samples simultaneously. +Therefore, we make a single, merged object available for each project containing all raw and normalized gene expression data and metadata for all single-cell and single-nuclei RNA-seq libraries within a given ScPCA project. +The data in the merged object has simply been combined, and no batch-corrected or integrated data is included. +If downloading data from a ScPCA project as a single, merged file, the download will include a single `.rds` or `.hdf5` file, a summary report for the merged object, and a folder with all individual QC and cell type reports for each library found in the merged object (Figure 3B). + +To build the merged objects, we created an additional stand-alone workflow for merging the output from `scpca-nf`, `merge.nf` (Figure 3C). +`merge.nf` takes as input the processed `SingleCellExperiment` objects output by `scpca-nf` for all single-cell and single-nuclei libraries included in a given ScPCA project. +The gene expression data stored in all `SingleCellExperiment` objects are then merged to produce a single merged gene by cell counts matrix containing all cells from all libraries and all shared genes. +The genes available in the merged object will be the same as those in each individual object, as all objects on the Portal were quantified using the same index. +Any metadata found in the individual processed `SingleCellExperiment` objects are also merged (e.g., `colData`, `rowData`, and `metadata`). +The merged normalized counts matrix is then used to select high-variance genes in a library-aware manner before performing dimensionality reduction with both PCA and UMAP. +`merge.nf` outputs the merged and processed object as a `SingleCellExperiment` object. + +We also account for additional modalities in `merge.nf`. +If at least one library in a project contains ADT data, the raw and normalized ADT data are also merged and saved as an `altExp` in the merged `SingleCellExperiment` object. +If any libraries in a project are multiplexed, the HTO data is not merged and is not included in the merged object. +All merged `SingleCellExperiment` objects are converted to `AnnData` objects and exported as `.hdf5` files. +If the merged object contains an `altExp` with merged ADT data, two `AnnData` objects are exported to create separate RNA (`_rna.hdf5`) and ADT (`_adt.hdf5`) objects. + +`merge.nf` outputs a summary report for each merged object, which includes a set of tables summarizing the types of samples and libraries included in the project, such as types of diagnosis, and a faceted UMAP showing all cells from all libraries. +In the UMAP, each panel represents a different library included in the merged object, with all cells from the specified library shown in color, while all other cells are gray. +An example of this UMAP showing a subset of libraries from a ScPCA project is available in Figure 3D. + +