You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am recreating an analysis done by researchers from Regeneron Pharmaceuticals published in Immunostimulatory Cancer-Associated Fibroblast Subpopulations Can Predict Immunotherapy Response in Head and Neck Cancer, Clinical Cancer Research but using scClassify instead of SingleR and on in-house data. They used Human Primary Cell Atlas and BLUEPRINT Consortium atlas, as provided by the Biocondcutor package celldex, developed by the same people who developed Bioconductor package SingleR (i.e. Aaron Lun et al.). These atlases are comprised of either purified cell types or cell lines. Human Primary Cell Atlas is developed using 745 microarrays from 105 separate studies that are freely available for download from N.C.B.I. Gene Expression Omnibus.
A large and diverse set of human primary cell gene expression data was collected, with a particular emphasis on datasets that divided immune cells into sub-populations based upon surface markers.
library(celldex)
HPCAset<- HumanPrimaryCellAtlasData() # Downloads and caches the atlas.> table(colData(HPCAset)$label.main)
AstrocyteB_cellBMBM&Prog.22671ChondrocytesCMPDCEmbryonic_stem_cells828817Endothelial_cellsEpithelial_cellsErythroblastFibroblasts6416810......
The other atlas has less samples but is all from one project.
259 RNA-seq samples of pure stroma and immune cells as generated and supplied by BLUEPRINT and ENCODE.
Human Primary Cell Atlas is used as the example data set in the SingleR vignette and classification works. However, scClassify fails with an error message if either one of these cell type atlases are used.
Could this error message be changed into a warning and the classification allowed to proceed? Or, is it important to prevent that? There are some cell types in the atlas with two samples belonging to them. For example, having one sample belonging to Eosinophils triggers the error, but removing singleton classes and even having Keratinocytes class with only two samples belonging to proceeds with classification. Then, another error / message is reached.
classCounts<- table(colData(HPCAset)[["label.main"]])
keepSamples<- colData(HPCAset)[["label.main"]] %in% names(classCounts)[classCounts>1]
> table(keepSamples) # Two samples, each belonging to one class, will be dropped from the atlas.keepSamplesFALSETRUE2711HPCAset<-HPCAset[, keepSamples] # All classes now have at least two samples.>HPCAclassify<- scClassify(assays(HPCAset)[["logcounts"]], colData(HPCAset)[["label.main"]], RNAdata)
Thereareonly0selectedgenesinreferencedataexpressedinquerydata.> nrow(HPCAset)
[1] 19363> length(intersect(rownames(HPCAset), rownames(RNAdata))) # Must be an abundance issue instead of gene name matching.
[1] 17589
It seems that the selected genes probably have moderate rather than high abundance and are not detectable by the poor limit-of-detection of single-cell sequencing. May scClassify output the list of selected genes in the error message to allow manual inspection? Also, could scClassify automatically select only useable genes and avoid the unuseable ones during feature selection to prevent ever selecting zero expressed genes?
The text was updated successfully, but these errors were encountered:
I am recreating an analysis done by researchers from Regeneron Pharmaceuticals published in Immunostimulatory Cancer-Associated Fibroblast Subpopulations Can Predict Immunotherapy Response in Head and Neck Cancer, Clinical Cancer Research but using scClassify instead of SingleR and on in-house data. They used Human Primary Cell Atlas and BLUEPRINT Consortium atlas, as provided by the Biocondcutor package celldex, developed by the same people who developed Bioconductor package SingleR (i.e. Aaron Lun et al.). These atlases are comprised of either purified cell types or cell lines. Human Primary Cell Atlas is developed using 745 microarrays from 105 separate studies that are freely available for download from N.C.B.I. Gene Expression Omnibus.
The other atlas has less samples but is all from one project.
Human Primary Cell Atlas is used as the example data set in the SingleR vignette and classification works. However, scClassify fails with an error message if either one of these cell type atlases are used.
Could this error message be changed into a warning and the classification allowed to proceed? Or, is it important to prevent that? There are some cell types in the atlas with two samples belonging to them. For example, having one sample belonging to Eosinophils triggers the error, but removing singleton classes and even having Keratinocytes class with only two samples belonging to proceeds with classification. Then, another error /
message
is reached.It seems that the selected genes probably have moderate rather than high abundance and are not detectable by the poor limit-of-detection of single-cell sequencing. May scClassify output the list of selected genes in the error message to allow manual inspection? Also, could scClassify automatically select only useable genes and avoid the unuseable ones during feature selection to prevent ever selecting zero expressed genes?
The text was updated successfully, but these errors were encountered: