-
Notifications
You must be signed in to change notification settings - Fork 34
[SCAVENGE] Preparing your scATAC seq data
FLYu edited this page Dec 2, 2022
·
5 revisions
How to generate lsimat50
,SE_gvar
and SE_gvar_bg
, which can be seamlessly fed into the SCAVENGE analysis.
If you use the ArchR pipeline to process your scATAC-seq data, you may follow the following code to produce these objects.
proj <- loadArchRProject(path = "./pbmc5k") # load your project
# ---- lsimat50
# embeding coordinates
umapdf=proj@embeddings@listData[["UMAP"]]@listData[["df"]] # if you performed batch effect correction, this might be a bit different
# reduced dimension matrix; cell x LSI (30)
lsimat50=proj@reducedDims@listData[["IterativeLSI"]]@listData[["matSVD"]] # if you performed batch effect correction, this might be a bit different
# make sure that all the cells are in the same order
# colnames(peakbycellmat) %>% head
# rownames(umapdf) %>% head
save(lsimat50, file="pbmc5000_10x_lsimat50.rda")
# ---- SE_gvar, SE_gvar_bg
# create the SummarizedExperiment files for SCAVENGE input
proj <- addPeakMatrix(proj)
getAvailableMatrices(ArchRProj = proj)
# [1] "GeneScoreMatrix" "PeakMatrix" "TileMatrix"
proj_PeakMatrix <- getMatrixFromProject(
ArchRProj = proj,
useMatrix = "PeakMatrix",
)
class(proj_PeakMatrix)
# [1] "RangedSummarizedExperiment"
# attr(,"package")
# [1] "SummarizedExperiment"
peakbycellmat <- assay(proj_PeakMatrix)
SE_gvar <- SummarizedExperiment(assays = list(counts = peakbycellmat),
rowRanges = rowRanges(proj_PeakMatrix),
colData = DataFrame(names = colnames(peakbycellmat)))
assayNames(SE_gvar) <- "counts"
SE_gvar <- addGCBias(SE_gvar, genome = BSgenome.Hsapiens.UCSC.hg19) # use the right genome
SE_gvar_bg <- getBackgroundPeaks(SE_gvar, niterations=200)
save(SE_gvar, SE_gvar_bg, file="pbmc5000_10x_SE_gvar.rda")
For details, please explore the sections on the right-hand side!