Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dataset #23

Merged
merged 8 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -2592,3 +2592,121 @@
##' @keywords datasets
##'
"guise2024"

####---- petrosius2023_mES ----####

##' Petrosius et al, 2023 (Nat. Comm.): Mouse embryonic stem cell (mESC) in
##' different culture conditions
##'
##' @description
##' Profiling mouse embryonic stem cells across ground-state (m2i) and
##' differentiation-permissive (m15) culture conditions. The data were
##' acquired using orbitrap-based data-independent acquisition (DIA).
##' The objective was to demonstrate the capability of their approach
##' by profiling mouse embryonic stem cell culture conditions, showcasing
##' heterogeneity in global proteomes, and highlighting differences in
##' the expression of key metabolic enzymes in distinct cell subclusters.
##'
##' @format A [QFeatures] object with 605 assays, each assay being a
##' [SingleCellExperiment] object:
##'
##' - Assay 1-603: PSM data acquired with a orbitrap-based data-independent
EnesSefaAyar marked this conversation as resolved.
Show resolved Hide resolved
##' acquisition (DIA) protocol, hence those assays contain single column
##' that contains the quantitative information.
##' - `peptides`: peptide data containing quantitative data for 9884
##' peptides and 603 single-cells.
##' - `proteins`: protein data containing quantitative data for 4270
##' proteins and 603 single-cells.
##'
##' Sample annotation is stored in `colData(petrosius2023_mES())`.
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Sample isolation**: Cell sorting was done on a Sony MA900 cell sorter
##' using a 130 μm sorting chip. Cells were sorted at single-cell resolution,
##' into a 384-well Eppendorf LoBind PCR plate (Eppendorf AG) containing 1 μL
##' of lysis buffer.
##' - **Sample preparation**: Single-cell protein lysates were digested with
##' 2 ng of Trypsin (Sigma cat. Nr. T6567) supplied in 1 μL of digestion
##' buffer (100mM TEAB pH 8.5, 1:5000 (v/v) benzonase (Sigma cat. Nr. E1014)).
##' The digestion was carried out overnight at 37 °C, and subsequently
##' acidified by the addition of 1 μL 1% (v/v) trifluoroacetic acid (TFA).
##' All liquid dispensing was done using an I-DOT One instrument (Dispendix).
##' - **Liquid chromatography**: The Evosep one liquid chromatography system was
##' used for DIA isolation window survey and HRMS1-DIA experiments.The standard
##' 31 min or 58min pre-defined Whisper gradients were used, where peptide
##' elution is carried out with 100 nl/min flow rate. A 15 cm × 75 μm
##' ID column (PepSep) with 1.9 μm C18 beads (Dr. Maisch, Germany) and a 10
##' μm ID silica electrospray emitter (PepSep) was used. Both LC systems were
##' coupled online to an orbitrap Eclipse TribridMass Spectrometer
##' (ThermoFisher Scientific) via an EasySpray ion source connected to a
##' FAIMSPro device.
##' - **Mass spectrometry**: The mass spectrometer was operated in positive
##' mode with the FAIMSPro interface compensation voltage set to −45 V.
##' MS1 scans were carried out at 120,000 resolution with an automatic gain
##' control (AGC) of 300% and maximum injection time set to auto. For the DIA
##' isolation window survey a scan range of 500–900 was used and 400–1000
##' rest of the experiments. Higher energy collisional dissociation (HCD) was
##' used for precursor fragmentation with a normalized collision energy (NCE)
##' of 33% and MS2 scan AGC target was set to 1000%.
##' - **Raw data processing**: The mESC raw data files were processed with
##' Spectronaut 17 and protein abundance tables exported and analyzed further
##' with python.
##'
##' @section Data collection:
##'
##' The data were provided by the Author and is accessible at the [Dataverse]
##' (https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
##' The folder ('20240205_111248_mESC_SNEcombine_m15-m2i/') contains the
##' following files of interest:
##'
##' - `20240205_111251_PEPQuant (Normal).tsv`: the PSM level data
##' - `20240205_111251_Peptide Quant (Normal).tsv`: the peptide level data
##' - `20240205_111251_PGQuant (Normal).tsv`: the protein level data
##'
##' The metadata downloaded from the [Zenodo
EnesSefaAyar marked this conversation as resolved.
Show resolved Hide resolved
##' repository] (https://zenodo.org/records/8146605).
##'
##' - `sample_facs.csv`: the metadata
##'
##' We formatted the quantification table so that columns match with the
EnesSefaAyar marked this conversation as resolved.
Show resolved Hide resolved
##' metadata. Then, both tables are then combined in a single
##' [QFeatures] object using the [scp::readSCP()] function.
##'
##' The peptide data were formated to a [SingleCellExperiment] object and the
EnesSefaAyar marked this conversation as resolved.
Show resolved Hide resolved
##' sample metadata were matched to the column names and stored in the `colData`.
##' The object is then added to the [QFeatures] object and the rows of the PSM
##' data are linked to the rows of the peptide data based on the peptide sequence
##' information through an `AssayLink` object.
##'
##' The protein data were formated to a [SingleCellExperiment] object and
##' the sample metadata were matched to the column names and stored in the
##' `colData`. The object is then added to the [QFeatures] object and the rows
##' of the peptide data are linked to the rows of the protein data based on the
##' protein sequence information through an `AssayLink` object.
##'
##' @source
##' The peptide and protein data can be downloaded from the [Dataverse]
##' (https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
##' The raw data and the quantification data can also be found in the
##' MassIVE repository `MSV000092429`:
##' ftp://[email protected]/.
##'
##' @references
##' **Source article**: Petrosius, V., Aragon-Fernandez, P., Üresin, N. et al.
##' "Exploration of cell state heterogeneity using single-cell proteomics
##' through sensitivity-tailored data-independent acquisition."
##' Nat Commun 14, 5910 (2023).
##' ([link to article](https://doi.org/10.1038/s41467-023-41602-1)).
##'
##' @examples
##' \donttest{
##' petrosius2023_mES()
##' }
##'
##' @keywords datasets
##'
"petrosius2023_mES"
1 change: 1 addition & 0 deletions inst/extdata/metadata.csv
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@
"22","gregoire2023_mixCTRL","Single-cell proteomics data from two monocyte cell lines","3.19",NA,"TXT","https://www.ebi.ac.uk/pride/archive/projects/PXD046211",NA,"Homo sapiens",9606,TRUE,"PRIDE","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/gregoire2023_mixCTRL.Rda",2024-01-22,119,"Sage","TMT-16",TRUE,TRUE,TRUE,TRUE,NA
"23","khan2023","Single-cell proteomics data of 421 MCF-10A cells undergoing EMT triggered by TGFβ","3.19",NA,"TXT","https://drive.google.com/drive/folders/1zCsRKWNQuAz5msxx0DfjDrIe6pUjqQmj",NA,"Homo sapiens",9606,TRUE,"MassIVE","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/khan2023.Rda",2023-12-21,47,"MaxQuant","TMTPro 16plex",TRUE,TRUE,TRUE,TRUE,NA
"24","guise2024","Single-cell proteomics data of 108 postmortem CTL or ALS spinal moto neurons","3.19",NA,"TXT","ftp://massive.ucsd.edu/v05/MSV000092119/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Christophe Vanderaa <[email protected]>","QFeatures","Rda","scpdata/guise2024.rda",2024-01-05,47,"Proteome Discoverer","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"25","petrosius2023_mES","Mouse embryonic stem cells across ground-state (m2i) and differentiation-permissive (m15) culture conditions.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT",NA,"Homo sapiens",9606,TRUE,"Dataverse","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/petrosius2023_mES.Rda",2024-04-09,605,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
129 changes: 129 additions & 0 deletions inst/scripts/make-data_petrosius2023_mES.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@

####---- Petrosius et al, 2023 ---####


## Petrosius, V., Aragon-Fernandez, P., Üresin, N. et al. Exploration of cell
## state heterogeneity using single-cell proteomics through sensitivity-tailored
## data-independent acquisition. Nat Commun 14, 5910 (2023).
## https://doi.org/10.1038/s41467-023-41602-1

library(SingleCellExperiment)
library(scp)
library(tidyverse)

####---- Load PSM data ----####
## The PSM data downloaded from the https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT
## and 'sample_facs.csv' from the https://zenodo.org/records/8146605
## '20240205_111251_PEPQuant (Normal).tsv' = contains the PSM data.
## 'sample_facs.csv' = contains the cell annotations.

root <- "~/localdata/SCP/petrosiusmESC/20240205_111248_mESC_SNEcombine_m15-m2i/"
ev <- read.delim(paste0(root, "20240205_111251_PEPQuant (Normal).tsv"))
design <- read.delim(paste0(root, "sample_facs.csv"))

####---- Create sample annotation ----####
design %>%
select(-X) %>%
distinct() %>%
add_column(Channel = "PEP.Quantity") %>%
rename(Set = File.Name,
SampleType = Plate) ->
meta

## Clean quantitative data
ev %>%
rename(Set = R.FileName,
protein = PG.ProteinAccessions) %>%
## Create a modified sequence + charge variable
mutate(peptide = paste0("_", PEP.StrippedSequence, "_.", FG.Charge)) %>%
filter(Set %in% meta$Set) ->
evproc

## Create the QFeatures object
petrosius2023_mES <- readSCP(evproc,
meta,
channelCol = "Channel",
batchCol = "Set",
removeEmptyCols = TRUE)


####---- Peptide data ----####
## The peptide data downloaded from the https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT
## '20240205_111251_Peptide Quant (Normal).tsv' contains the peptide data.

## Load the peptide level quantification data
pep_data <- read.delim(paste0(root, "20240205_111251_Peptide Quant (Normal).tsv"))

## Clean quantitative data
pep_data %>%
pivot_wider(names_from = R.FileName,
values_from = PG.Quantity,
id_cols = c(EG.PrecursorId, PG.ProteinAccessions)) ->
peps

## Create the SingleCellExperiment object
pep <- readSingleCellExperiment(peps,
ecol = 3:605)

## Name rows with peptide sequence
rownames(pep) <- peps$EG.PrecursorId

## Rename columns so they math with the PSM data
colnames(pep) %>%
paste0("PEP.Quantity") ->
colnames(pep)

## Include the peptide data in the QFeatures object
petrosius2023_mES <- addAssay(petrosius2023_mES, pep, name = "peptides")

## Link the PSMs and the peptides
petrosius2023_mES <- addAssayLink(petrosius2023_mES,
from = 1:603,
to = "peptides",
varFrom = rep("EG.PrecursorId", 603),
varTo = "EG.PrecursorId")


####---- Add the protein data ----####
## The peptide data downloaded from the https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT
## '20240205_111251_PGQuant (Normal).tsv' contains the protein data.

prot_data <- read.delim(paste0(root, "20240205_111251_PGQuant (Normal).tsv"))

## Clean quantitative data
prot_data %>%
mutate(R.FileName = sub(".*rawfiles/", "", R.Raw.File.Name)) %>%
mutate(R.FileName = sub(".raw", "", R.FileName)) %>%
pivot_wider(names_from = R.FileName,
values_from = PG.Quantity,
id_cols = PG.ProteinAccessions) ->
prots

## Create the SingleCellExperiment object
pro <- readSingleCellExperiment(prots,
ecol = 2:604)

## Name rows with peptide sequence
rownames(pro) <- prots$PG.ProteinAccessions

## Rename columns so they math with the PSM data
colnames(pro) %>%
paste0("PEP.Quantity") ->
colnames(pro)

## Include the peptide data in the QFeatures object
petrosius2023_mES <- addAssay(petrosius2023_mES, pro, name = "proteins")

## Link the PSMs and the peptides
petrosius2023_mES <- addAssayLink(petrosius2023_mES,
from = "peptides",
to = "proteins",
varFrom = "PG.ProteinAccessions",
varTo = "PG.ProteinAccessions")

## Save data
save(petrosius2023_mES,
file = file.path(paste0(root, "petrosius2023_mES.Rda")),
compress = "xz",
compression_level = 9)

27 changes: 27 additions & 0 deletions inst/scripts/make-metadata.R
Original file line number Diff line number Diff line change
Expand Up @@ -656,6 +656,33 @@ meta <- list(
ProteinsAvailable = TRUE,
ContainsSingleCells = TRUE,
Notes = NA_character_
),
data.frame(
Title = "petrosius2023_mES",
Description = paste0("Mouse embryonic stem cells across ground-state (m2i) ",
"and differentiation-permissive (m15) culture conditions."),
BiocVersion = "3.19",
Genome = NA_character_,
SourceType = "TXT",
SourceUrl = "https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT",
SourceVersion = NA_character_,
Species = "Homo sapiens",
TaxonomyId = 9606,
Coordinate_1_based = TRUE,
DataProvider = "Dataverse",
Maintainer = "Enes Sefa Ayar <[email protected]>",
RDataClass = "QFeatures",
DispatchClass = "Rda",
RDataPath = "scpdata/petrosius2023_mES.Rda",
PublicationDate = as.Date("2024/04/09"),
NumberAssays = 605,
PreprocessingSoftware = "Spectronaut",
LabelingProtocol = "LFQ",
PsmsAvailable = TRUE,
PeptidesAvailable = TRUE,
ProteinsAvailable = TRUE,
ContainsSingleCells = TRUE,
Notes = NA_character_
)
)
meta <- do.call(rbind, meta)
Expand Down
Loading