Skip to content

[SCAVENGE] Preparing your scATAC seq data

FLYu edited this page Dec 1, 2022 · 5 revisions

How to generate lsimat50,SE_gvar and SE_gvar_bg, which can be seamlessly fed into the SCAVENGE analysis.

If you use the ArchR pipeline to process your scATAC-seq data, you may follow the following code to produce these objects.

proj <- loadArchRProject(path = "./pbmc5k") # load your project

getAvailableMatrices(ArchRProj = projHeme5)
# [1] "GeneScoreMatrix" "PeakMatrix"      "TileMatrix"

# embeding coordinates
umapdf=projHeme5@embeddings@listData[["UMAP"]]@listData[["df"] # if you performed batch effect correction, this might be a little different

# reduced dimension matrix; cell x LSI (30)
lsimat50=projHeme5@reducedDims@listData[["IterativeLSI"]]@listData[["matSVD"]] # if you performed batch effect correction, this might be a little different

# make sure that all the cells are in the same order
# colnames(peakbycellmat) %>% head
# rownames(umapdf) %>% head
save(lsimat50, file="pbmc5000_10x_lsimat50.rda")

# make the SummarizedExperiment files for SCAVENGE input
SE_gvar <- SummarizedExperiment(assays = list(counts = peakbycellmat),
                           rowRanges = rowRanges(proj_PeakMatrix), 
                           colData = DataFrame(names = colnames(peakbycellmat)))

assayNames(SE_gvar) <- "counts"
SE_gvar <- addGCBias(SE_gvar, genome = BSgenome.Hsapiens.UCSC.hg19)
SE_gvar_bg <- getBackgroundPeaks(SE_gvar, niterations=200)
save(SE_gvar, SE_gvar_bg, file="pbmc5000_10x_SE_gvar.rda")