merge PRs

UCLouvain-CBIO · Apr 10, 2024 · f3586fb · f3586fb
2 parents f8fb58f + 98d2505
commit f3586fb
Show file tree

Hide file tree

Showing 4 changed files with 353 additions and 78 deletions.
diff --git a/R/data.R b/R/data.R
@@ -2593,6 +2593,124 @@
 ##'
 "guise2024"
 
+####---- petrosius2023_mES ----####
+
+##' Petrosius et al, 2023 (Nat. Comm.): Mouse embryonic stem cell (mESC) in 
+##' different culture conditions
+##' 
+##' @description
+##' Profiling mouse embryonic stem cells across ground-state (m2i) and 
+##' differentiation-permissive (m15) culture conditions. The data were 
+##' acquired using orbitrap-based data-independent acquisition (DIA). 
+##' The objective was to demonstrate the capability of their approach 
+##' by profiling mouse embryonic stem cell culture conditions, showcasing 
+##' heterogeneity in global proteomes, and highlighting differences in 
+##' the expression of key metabolic enzymes in distinct cell subclusters.
+##' 
+##' @format A [QFeatures] object with 605 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - Assay 1-603: PSM data acquired with an orbitrap-based data-independent 
+##'   acquisition (DIA) protocol, hence those assays contain single column 
+##'   that contains the quantitative information.
+##' - `peptides`: peptide data containing quantitative data for 9884
+##'   peptides and 603 single-cells. 
+##' - `proteins`: protein data containing quantitative data for 4270
+##'   proteins and 603 single-cells. 
+##'
+##' Sample annotation is stored in `colData(petrosius2023_mES())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Sample isolation**: Cell sorting was done on a Sony MA900 cell sorter 
+##'   using a 130 μm sorting chip. Cells were sorted at single-cell resolution, 
+##'   into a 384-well Eppendorf LoBind PCR plate (Eppendorf AG) containing 1 μL 
+##'   of lysis buffer.
+##' - **Sample preparation**: Single-cell protein lysates were digested with 
+##'   2 ng of Trypsin (Sigma cat. Nr. T6567) supplied in 1 μL of digestion 
+##'   buffer (100mM TEAB pH 8.5, 1:5000 (v/v) benzonase (Sigma cat. Nr. E1014)).
+##'   The digestion was carried out overnight at 37 °C, and subsequently 
+##'   acidified by the addition of 1 μL 1% (v/v) trifluoroacetic acid (TFA). 
+##'   All liquid dispensing was done using an I-DOT One instrument (Dispendix).
+##' - **Liquid chromatography**: The Evosep one liquid chromatography system was 
+##'   used for DIA isolation window survey and HRMS1-DIA experiments.The standard
+##'   31 min or 58min pre-defined Whisper gradients were used, where peptide 
+##'   elution is carried out with 100 nl/min flow rate. A 15 cm × 75 μm 
+##'   ID column (PepSep) with 1.9 μm C18 beads (Dr. Maisch, Germany) and a 10 
+##'   μm ID silica electrospray emitter (PepSep) was used. Both LC systems were 
+##'   coupled online to an orbitrap Eclipse TribridMass Spectrometer 
+##'   (ThermoFisher Scientific) via an EasySpray ion source connected to a 
+##'   FAIMSPro device.
+##' - **Mass spectrometry**: The mass spectrometer was operated in positive 
+##'   mode with the FAIMSPro interface compensation voltage set to −45 V.
+##'   MS1 scans were carried out at 120,000 resolution with an automatic gain
+##'   control (AGC) of 300% and maximum injection time set to auto. For the DIA 
+##'   isolation window survey a scan range of 500–900 was used and 400–1000 
+##'   rest of the experiments. Higher energy collisional dissociation (HCD) was 
+##'   used for precursor fragmentation with a normalized collision energy (NCE) 
+##'   of 33% and MS2 scan AGC target was set to 1000%. 
+##' - **Raw data processing**: The mESC raw data files were processed with 
+##'   Spectronaut 17 and protein abundance tables exported and analyzed further 
+##'   with python. 
+##'
+##' @section Data collection:
+##'
+##' The data were provided by the Author and is accessible at the [Dataverse]
+##' (https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
+##' The folder ('20240205_111248_mESC_SNEcombine_m15-m2i/') contains the 
+##' following files of interest:
+##'
+##' - `20240205_111251_PEPQuant (Normal).tsv`: the PSM level data
+##' - `20240205_111251_Peptide Quant (Normal).tsv`: the peptide level data
+##' - `20240205_111251_PGQuant (Normal).tsv`: the protein level data
+##'
+##' The metadata were downloaded from the [Zenodo
+##' repository] (https://zenodo.org/records/8146605).
+##' 
+##' - `sample_facs.csv`: the metadata
+##' 
+##' We formatted the quantification table so that columns match with the 
+##' metadata. Then, both tables are then combined in a single
+##' [QFeatures] object using the [scp::readSCP()] function.
+##' 
+##' The peptide data were formated to a [SingleCellExperiment] object and the 
+##' sample metadata were matched to the column names and stored in the `colData`.
+##' The object is then added to the [QFeatures] object and the rows of the PSM 
+##' data are linked to the rows of the peptide data based on the peptide sequence 
+##' information through an `AssayLink` object.
+##' 
+##' The protein data were formated to a [SingleCellExperiment] object and 
+##' the sample metadata were matched to the column names and stored in the 
+##' `colData`. The object is then added to the [QFeatures] object and the rows 
+##' of the peptide data are linked to the rows of the protein data based on the 
+##' protein sequence information through an `AssayLink` object.
+##'
+##' @source
+##' The peptide and protein data can be downloaded from the [Dataverse]
+##' (https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
+##' The raw data and the quantification data can also be found in the
+##' MassIVE repository `MSV000092429`:
+##' ftp://[email protected]/.
+##' 
+##' @references
+##' **Source article**: Petrosius, V., Aragon-Fernandez, P., Üresin, N. et al. 
+##' "Exploration of cell state heterogeneity using single-cell proteomics 
+##' through sensitivity-tailored data-independent acquisition." 
+##' Nat Commun 14, 5910 (2023). 
+##' ([link to article](https://doi.org/10.1038/s41467-023-41602-1)).
+##'
+##' @examples
+##' \donttest{
+##' petrosius2023_mES()
+##' }
+##'
+##' @keywords datasets
+##'
+"petrosius2023_mES"
+
 ####---- petrosius2023_AML ----####
 
 ##' Petrosius et al. 2023 (bioRxiv): AML hierarchy on Astral.

diff --git a/inst/extdata/metadata.csv b/inst/extdata/metadata.csv
@@ -23,4 +23,5 @@
 "gregoire2023_mixCTRL","Single-cell proteomics data from two monocyte cell lines","3.19",NA,"TXT","https://www.ebi.ac.uk/pride/archive/projects/PXD046211",NA,"Homo sapiens",9606,TRUE,"PRIDE","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/gregoire2023_mixCTRL.Rda",2024-01-22,119,"Sage","TMT-16",TRUE,TRUE,TRUE,TRUE,NA
 "khan2023","Single-cell proteomics data of 421 MCF-10A cells undergoing EMT triggered by TGFβ","3.19",NA,"TXT","https://drive.google.com/drive/folders/1zCsRKWNQuAz5msxx0DfjDrIe6pUjqQmj",NA,"Homo sapiens",9606,TRUE,"MassIVE","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/khan2023.Rda",2023-12-21,47,"MaxQuant","TMTPro 16plex",TRUE,TRUE,TRUE,TRUE,NA
 "guise2024","Single-cell proteomics data of 108 postmortem CTL or ALS spinal moto neurons","3.19",NA,"TXT","ftp://massive.ucsd.edu/v05/MSV000092119/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Christophe Vanderaa <[email protected]>","QFeatures","Rda","scpdata/guise2024.rda",2024-01-05,47,"Proteome Discoverer","LFQ",TRUE,TRUE,TRUE,TRUE,NA
-"petrosius2023_AML","Single-cell proteomics data of 4 cell types from the OCI-AML8227 model.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT",NA,"Homo sapiens",9606,TRUE,"Dataverse","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/petrosius2023.Rda",2023-06-08,217,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
+"petrosius2023_mES","Mouse embryonic stem cells across ground-state (m2i) and differentiation-permissive (m15) culture conditions.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT",NA,"Homo sapiens",9606,TRUE,"Dataverse","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/petrosius2023_mES.Rda",2024-04-09,605,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
+"petrosius2023_AML","Single-cell proteomics data of 4 cell types from the OCI-AML8227 model.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/4DSPJM",NA,"Homo sapiens",9606,TRUE,"Dataverse","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/petrosius2023.Rda",2023-06-08,217,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
diff --git a/inst/scripts/make-data_petrosius2023_mES.R b/inst/scripts/make-data_petrosius2023_mES.R
@@ -0,0 +1,129 @@
+
+####---- Petrosius et al, 2023 ---####
+
+
+## Petrosius, V., Aragon-Fernandez, P., Üresin, N. et al. Exploration of cell
+## state heterogeneity using single-cell proteomics through sensitivity-tailored
+## data-independent acquisition. Nat Commun 14, 5910 (2023). 
+## https://doi.org/10.1038/s41467-023-41602-1
+
+library(SingleCellExperiment)
+library(scp)
+library(tidyverse)
+
+####---- Load PSM data ----####
+## The PSM data downloaded from the https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT 
+## and 'sample_facs.csv' from the https://zenodo.org/records/8146605
+## '20240205_111251_PEPQuant (Normal).tsv' = contains the PSM data.
+## 'sample_facs.csv' = contains the cell annotations.
+
+root <- "~/localdata/SCP/petrosiusmESC/20240205_111248_mESC_SNEcombine_m15-m2i/"
+ev <- read.delim(paste0(root, "20240205_111251_PEPQuant (Normal).tsv"))
+design <- read.delim(paste0(root, "sample_facs.csv"))
+
+####---- Create sample annotation ----####
+design %>%
+  select(-X) %>%
+  distinct() %>%
+  add_column(Channel = "PEP.Quantity") %>%
+  rename(Set = File.Name, 
+         SampleType = Plate) ->
+  meta
+
+## Clean quantitative data
+ev %>%
+  rename(Set = R.FileName, 
+         protein = PG.ProteinAccessions) %>%
+  ## Create a modified sequence + charge variable
+  mutate(peptide = paste0("_", PEP.StrippedSequence, "_.", FG.Charge)) %>%
+  filter(Set %in% meta$Set) ->
+  evproc
+
+## Create the QFeatures object
+petrosius2023_mES <- readSCP(evproc, 
+                    meta, 
+                    channelCol = "Channel", 
+                    batchCol = "Set",
+                    removeEmptyCols = TRUE)
+
+
+####---- Peptide data ----####
+## The peptide data downloaded from the https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT  
+## '20240205_111251_Peptide Quant (Normal).tsv' contains the peptide data.
+
+## Load the peptide level quantification data
+pep_data <- read.delim(paste0(root, "20240205_111251_Peptide Quant (Normal).tsv"))
+
+## Clean quantitative data
+pep_data %>%
+  pivot_wider(names_from = R.FileName, 
+              values_from = PG.Quantity, 
+              id_cols = c(EG.PrecursorId, PG.ProteinAccessions)) ->
+  peps
+
+## Create the SingleCellExperiment object
+pep <- readSingleCellExperiment(peps, 
+                                ecol = 3:605)
+
+## Name rows with peptide sequence
+rownames(pep) <- peps$EG.PrecursorId
+
+## Rename columns so they math with the PSM data
+colnames(pep) %>%
+  paste0("PEP.Quantity") ->
+  colnames(pep)
+
+## Include the peptide data in the QFeatures object
+petrosius2023_mES <- addAssay(petrosius2023_mES, pep, name = "peptides")
+
+## Link the PSMs and the peptides
+petrosius2023_mES <- addAssayLink(petrosius2023_mES, 
+                           from = 1:603, 
+                           to = "peptides",
+                           varFrom = rep("EG.PrecursorId", 603),
+                           varTo = "EG.PrecursorId")
+
+
+####---- Add the protein data ----####
+## The peptide data downloaded from the https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT
+## '20240205_111251_PGQuant (Normal).tsv' contains the protein data.
+
+prot_data <- read.delim(paste0(root, "20240205_111251_PGQuant (Normal).tsv"))
+
+## Clean quantitative data
+prot_data %>% 
+  mutate(R.FileName = sub(".*rawfiles/", "", R.Raw.File.Name)) %>%
+  mutate(R.FileName = sub(".raw", "", R.FileName)) %>%
+  pivot_wider(names_from = R.FileName, 
+              values_from = PG.Quantity, 
+              id_cols = PG.ProteinAccessions) ->
+  prots
+
+## Create the SingleCellExperiment object
+pro <- readSingleCellExperiment(prots, 
+                                ecol = 2:604)
+
+## Name rows with peptide sequence
+rownames(pro) <- prots$PG.ProteinAccessions
+
+## Rename columns so they math with the PSM data
+colnames(pro) %>%
+  paste0("PEP.Quantity") ->
+  colnames(pro)
+
+## Include the peptide data in the QFeatures object
+petrosius2023_mES <- addAssay(petrosius2023_mES, pro, name = "proteins")
+
+## Link the PSMs and the peptides
+petrosius2023_mES <- addAssayLink(petrosius2023_mES, 
+                                from = "peptides", 
+                                to = "proteins",
+                                varFrom = "PG.ProteinAccessions",
+                                varTo = "PG.ProteinAccessions")
+
+## Save data
+save(petrosius2023_mES,
+     file = file.path(paste0(root, "petrosius2023_mES.Rda")),
+     compress = "xz",
+     compression_level = 9)
+