Skip to content

Commit

Permalink
merge PRs
Browse files Browse the repository at this point in the history
  • Loading branch information
lgatto committed Apr 10, 2024
2 parents f8fb58f + 98d2505 commit f3586fb
Show file tree
Hide file tree
Showing 4 changed files with 353 additions and 78 deletions.
118 changes: 118 additions & 0 deletions R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -2593,6 +2593,124 @@
##'
"guise2024"

####---- petrosius2023_mES ----####

##' Petrosius et al, 2023 (Nat. Comm.): Mouse embryonic stem cell (mESC) in
##' different culture conditions
##'
##' @description
##' Profiling mouse embryonic stem cells across ground-state (m2i) and
##' differentiation-permissive (m15) culture conditions. The data were
##' acquired using orbitrap-based data-independent acquisition (DIA).
##' The objective was to demonstrate the capability of their approach
##' by profiling mouse embryonic stem cell culture conditions, showcasing
##' heterogeneity in global proteomes, and highlighting differences in
##' the expression of key metabolic enzymes in distinct cell subclusters.
##'
##' @format A [QFeatures] object with 605 assays, each assay being a
##' [SingleCellExperiment] object:
##'
##' - Assay 1-603: PSM data acquired with an orbitrap-based data-independent
##' acquisition (DIA) protocol, hence those assays contain single column
##' that contains the quantitative information.
##' - `peptides`: peptide data containing quantitative data for 9884
##' peptides and 603 single-cells.
##' - `proteins`: protein data containing quantitative data for 4270
##' proteins and 603 single-cells.
##'
##' Sample annotation is stored in `colData(petrosius2023_mES())`.
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Sample isolation**: Cell sorting was done on a Sony MA900 cell sorter
##' using a 130 μm sorting chip. Cells were sorted at single-cell resolution,
##' into a 384-well Eppendorf LoBind PCR plate (Eppendorf AG) containing 1 μL
##' of lysis buffer.
##' - **Sample preparation**: Single-cell protein lysates were digested with
##' 2 ng of Trypsin (Sigma cat. Nr. T6567) supplied in 1 μL of digestion
##' buffer (100mM TEAB pH 8.5, 1:5000 (v/v) benzonase (Sigma cat. Nr. E1014)).
##' The digestion was carried out overnight at 37 °C, and subsequently
##' acidified by the addition of 1 μL 1% (v/v) trifluoroacetic acid (TFA).
##' All liquid dispensing was done using an I-DOT One instrument (Dispendix).
##' - **Liquid chromatography**: The Evosep one liquid chromatography system was
##' used for DIA isolation window survey and HRMS1-DIA experiments.The standard
##' 31 min or 58min pre-defined Whisper gradients were used, where peptide
##' elution is carried out with 100 nl/min flow rate. A 15 cm × 75 μm
##' ID column (PepSep) with 1.9 μm C18 beads (Dr. Maisch, Germany) and a 10
##' μm ID silica electrospray emitter (PepSep) was used. Both LC systems were
##' coupled online to an orbitrap Eclipse TribridMass Spectrometer
##' (ThermoFisher Scientific) via an EasySpray ion source connected to a
##' FAIMSPro device.
##' - **Mass spectrometry**: The mass spectrometer was operated in positive
##' mode with the FAIMSPro interface compensation voltage set to −45 V.
##' MS1 scans were carried out at 120,000 resolution with an automatic gain
##' control (AGC) of 300% and maximum injection time set to auto. For the DIA
##' isolation window survey a scan range of 500–900 was used and 400–1000
##' rest of the experiments. Higher energy collisional dissociation (HCD) was
##' used for precursor fragmentation with a normalized collision energy (NCE)
##' of 33% and MS2 scan AGC target was set to 1000%.
##' - **Raw data processing**: The mESC raw data files were processed with
##' Spectronaut 17 and protein abundance tables exported and analyzed further
##' with python.
##'
##' @section Data collection:
##'
##' The data were provided by the Author and is accessible at the [Dataverse]
##' (https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
##' The folder ('20240205_111248_mESC_SNEcombine_m15-m2i/') contains the
##' following files of interest:
##'
##' - `20240205_111251_PEPQuant (Normal).tsv`: the PSM level data
##' - `20240205_111251_Peptide Quant (Normal).tsv`: the peptide level data
##' - `20240205_111251_PGQuant (Normal).tsv`: the protein level data
##'
##' The metadata were downloaded from the [Zenodo
##' repository] (https://zenodo.org/records/8146605).
##'
##' - `sample_facs.csv`: the metadata
##'
##' We formatted the quantification table so that columns match with the
##' metadata. Then, both tables are then combined in a single
##' [QFeatures] object using the [scp::readSCP()] function.
##'
##' The peptide data were formated to a [SingleCellExperiment] object and the
##' sample metadata were matched to the column names and stored in the `colData`.
##' The object is then added to the [QFeatures] object and the rows of the PSM
##' data are linked to the rows of the peptide data based on the peptide sequence
##' information through an `AssayLink` object.
##'
##' The protein data were formated to a [SingleCellExperiment] object and
##' the sample metadata were matched to the column names and stored in the
##' `colData`. The object is then added to the [QFeatures] object and the rows
##' of the peptide data are linked to the rows of the protein data based on the
##' protein sequence information through an `AssayLink` object.
##'
##' @source
##' The peptide and protein data can be downloaded from the [Dataverse]
##' (https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
##' The raw data and the quantification data can also be found in the
##' MassIVE repository `MSV000092429`:
##' ftp://[email protected]/.
##'
##' @references
##' **Source article**: Petrosius, V., Aragon-Fernandez, P., Üresin, N. et al.
##' "Exploration of cell state heterogeneity using single-cell proteomics
##' through sensitivity-tailored data-independent acquisition."
##' Nat Commun 14, 5910 (2023).
##' ([link to article](https://doi.org/10.1038/s41467-023-41602-1)).
##'
##' @examples
##' \donttest{
##' petrosius2023_mES()
##' }
##'
##' @keywords datasets
##'
"petrosius2023_mES"

####---- petrosius2023_AML ----####

##' Petrosius et al. 2023 (bioRxiv): AML hierarchy on Astral.
Expand Down
3 changes: 2 additions & 1 deletion inst/extdata/metadata.csv
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,5 @@
"gregoire2023_mixCTRL","Single-cell proteomics data from two monocyte cell lines","3.19",NA,"TXT","https://www.ebi.ac.uk/pride/archive/projects/PXD046211",NA,"Homo sapiens",9606,TRUE,"PRIDE","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/gregoire2023_mixCTRL.Rda",2024-01-22,119,"Sage","TMT-16",TRUE,TRUE,TRUE,TRUE,NA
"khan2023","Single-cell proteomics data of 421 MCF-10A cells undergoing EMT triggered by TGFβ","3.19",NA,"TXT","https://drive.google.com/drive/folders/1zCsRKWNQuAz5msxx0DfjDrIe6pUjqQmj",NA,"Homo sapiens",9606,TRUE,"MassIVE","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/khan2023.Rda",2023-12-21,47,"MaxQuant","TMTPro 16plex",TRUE,TRUE,TRUE,TRUE,NA
"guise2024","Single-cell proteomics data of 108 postmortem CTL or ALS spinal moto neurons","3.19",NA,"TXT","ftp://massive.ucsd.edu/v05/MSV000092119/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Christophe Vanderaa <[email protected]>","QFeatures","Rda","scpdata/guise2024.rda",2024-01-05,47,"Proteome Discoverer","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"petrosius2023_AML","Single-cell proteomics data of 4 cell types from the OCI-AML8227 model.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT",NA,"Homo sapiens",9606,TRUE,"Dataverse","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/petrosius2023.Rda",2023-06-08,217,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"petrosius2023_mES","Mouse embryonic stem cells across ground-state (m2i) and differentiation-permissive (m15) culture conditions.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT",NA,"Homo sapiens",9606,TRUE,"Dataverse","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/petrosius2023_mES.Rda",2024-04-09,605,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"petrosius2023_AML","Single-cell proteomics data of 4 cell types from the OCI-AML8227 model.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/4DSPJM",NA,"Homo sapiens",9606,TRUE,"Dataverse","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/petrosius2023.Rda",2023-06-08,217,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
129 changes: 129 additions & 0 deletions inst/scripts/make-data_petrosius2023_mES.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@

####---- Petrosius et al, 2023 ---####


## Petrosius, V., Aragon-Fernandez, P., Üresin, N. et al. Exploration of cell
## state heterogeneity using single-cell proteomics through sensitivity-tailored
## data-independent acquisition. Nat Commun 14, 5910 (2023).
## https://doi.org/10.1038/s41467-023-41602-1

library(SingleCellExperiment)
library(scp)
library(tidyverse)

####---- Load PSM data ----####
## The PSM data downloaded from the https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT
## and 'sample_facs.csv' from the https://zenodo.org/records/8146605
## '20240205_111251_PEPQuant (Normal).tsv' = contains the PSM data.
## 'sample_facs.csv' = contains the cell annotations.

root <- "~/localdata/SCP/petrosiusmESC/20240205_111248_mESC_SNEcombine_m15-m2i/"
ev <- read.delim(paste0(root, "20240205_111251_PEPQuant (Normal).tsv"))
design <- read.delim(paste0(root, "sample_facs.csv"))

####---- Create sample annotation ----####
design %>%
select(-X) %>%
distinct() %>%
add_column(Channel = "PEP.Quantity") %>%
rename(Set = File.Name,
SampleType = Plate) ->
meta

## Clean quantitative data
ev %>%
rename(Set = R.FileName,
protein = PG.ProteinAccessions) %>%
## Create a modified sequence + charge variable
mutate(peptide = paste0("_", PEP.StrippedSequence, "_.", FG.Charge)) %>%
filter(Set %in% meta$Set) ->
evproc

## Create the QFeatures object
petrosius2023_mES <- readSCP(evproc,
meta,
channelCol = "Channel",
batchCol = "Set",
removeEmptyCols = TRUE)


####---- Peptide data ----####
## The peptide data downloaded from the https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT
## '20240205_111251_Peptide Quant (Normal).tsv' contains the peptide data.

## Load the peptide level quantification data
pep_data <- read.delim(paste0(root, "20240205_111251_Peptide Quant (Normal).tsv"))

## Clean quantitative data
pep_data %>%
pivot_wider(names_from = R.FileName,
values_from = PG.Quantity,
id_cols = c(EG.PrecursorId, PG.ProteinAccessions)) ->
peps

## Create the SingleCellExperiment object
pep <- readSingleCellExperiment(peps,
ecol = 3:605)

## Name rows with peptide sequence
rownames(pep) <- peps$EG.PrecursorId

## Rename columns so they math with the PSM data
colnames(pep) %>%
paste0("PEP.Quantity") ->
colnames(pep)

## Include the peptide data in the QFeatures object
petrosius2023_mES <- addAssay(petrosius2023_mES, pep, name = "peptides")

## Link the PSMs and the peptides
petrosius2023_mES <- addAssayLink(petrosius2023_mES,
from = 1:603,
to = "peptides",
varFrom = rep("EG.PrecursorId", 603),
varTo = "EG.PrecursorId")


####---- Add the protein data ----####
## The peptide data downloaded from the https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT
## '20240205_111251_PGQuant (Normal).tsv' contains the protein data.

prot_data <- read.delim(paste0(root, "20240205_111251_PGQuant (Normal).tsv"))

## Clean quantitative data
prot_data %>%
mutate(R.FileName = sub(".*rawfiles/", "", R.Raw.File.Name)) %>%
mutate(R.FileName = sub(".raw", "", R.FileName)) %>%
pivot_wider(names_from = R.FileName,
values_from = PG.Quantity,
id_cols = PG.ProteinAccessions) ->
prots

## Create the SingleCellExperiment object
pro <- readSingleCellExperiment(prots,
ecol = 2:604)

## Name rows with peptide sequence
rownames(pro) <- prots$PG.ProteinAccessions

## Rename columns so they math with the PSM data
colnames(pro) %>%
paste0("PEP.Quantity") ->
colnames(pro)

## Include the peptide data in the QFeatures object
petrosius2023_mES <- addAssay(petrosius2023_mES, pro, name = "proteins")

## Link the PSMs and the peptides
petrosius2023_mES <- addAssayLink(petrosius2023_mES,
from = "peptides",
to = "proteins",
varFrom = "PG.ProteinAccessions",
varTo = "PG.ProteinAccessions")

## Save data
save(petrosius2023_mES,
file = file.path(paste0(root, "petrosius2023_mES.Rda")),
compress = "xz",
compression_level = 9)

Loading

0 comments on commit f3586fb

Please sign in to comment.