Skip to content

[SCAVENGE] Preparing your scATAC seq data

FLYu edited this page Dec 2, 2022 · 5 revisions

How to generate lsimat50,SE_gvar and SE_gvar_bg, which can be seamlessly fed into the SCAVENGE analysis.

If you use the ArchR pipeline to process your scATAC-seq data, you may follow the following code to produce these objects.

proj <- loadArchRProject(path = "./pbmc5k") # load your project

# ---- lsimat50
# embeding coordinates
umapdf=proj@embeddings@listData[["UMAP"]]@listData[["df"]] # if you performed batch effect correction, this might be a bit different
# reduced dimension matrix; cell x LSI (30)
lsimat50=proj@reducedDims@listData[["IterativeLSI"]]@listData[["matSVD"]] # if you performed batch effect correction, this might be a bit different
# make sure that all the cells are in the same order
# colnames(peakbycellmat) %>% head
# rownames(umapdf) %>% head
save(lsimat50, file="pbmc5000_10x_lsimat50.rda")


# ---- SE_gvar, SE_gvar_bg
# create the SummarizedExperiment files for SCAVENGE input
proj <- addPeakMatrix(proj)
getAvailableMatrices(ArchRProj = proj)
# [1] "GeneScoreMatrix" "PeakMatrix"      "TileMatrix"
proj_PeakMatrix <- getMatrixFromProject(
  ArchRProj = proj,
  useMatrix = "PeakMatrix",
)
class(proj_PeakMatrix)
# [1] "RangedSummarizedExperiment"
# attr(,"package")
# [1] "SummarizedExperiment"
peakbycellmat <- assay(proj_PeakMatrix)
SE_gvar <- SummarizedExperiment(assays = list(counts = peakbycellmat),
                           rowRanges = rowRanges(proj_PeakMatrix), 
                           colData = DataFrame(names = colnames(peakbycellmat)))

assayNames(SE_gvar) <- "counts"
SE_gvar <- addGCBias(SE_gvar, genome = BSgenome.Hsapiens.UCSC.hg19) # use the right genome
SE_gvar_bg <- getBackgroundPeaks(SE_gvar, niterations=200)
save(SE_gvar, SE_gvar_bg, file="pbmc5000_10x_SE_gvar.rda")