This repository contains code associated with the MulTI-Tag manuscript (Meers et al. 2021 bioRxiv):
Meers MP, Janssens DH, Henikoff S. Multifactorial chromatin regulatory landscapes at single cell resolution. bioRxiv 2021.07.08.451691; DOI: doi.org/10.1101/2021.07.08.451691
This code is sufficient to generate processed single cell matrices underlying the UMAP and heatmap plots presented.
-
Align reads to hg19 genome build. Fastq files are appended with barcode identity in the QNAME field for downstream processing into cell- and target-specific datasets.
-
Merge aligned SAM files. For downstream processing, SAM files representing H1 and K562 cell data from the same target should be merged using samtools merge.
-
Generate CellRanger bed files. We format data into CellRanger-style bed files of the following column structure:
- chr
- start
- end
- cell barcode
- number of duplicates
-
Calculate unique fragments per cell. For each SAM file (including merged SAM files), we generate two-column files representing unique fragments per cell and barcode assigned to that cell, sorted by descending number of unique fragments.
-
Filter cells based on unique fragments. For this analysis, we consider only cells that meet all of the following criteria:
- > 500 unique H3K27me3 fragments
- > 200 unique H3K4me2 fragments
- > 200 unique H3K36me3 fragments
-
Call peaks from aggregated data. We use SEACR v1.4 with the following conditions: -n norm -m stringent -e 5
-
Map single cell fragments onto peaks. We use bedtools intersect -wao to quantify fragment overlap counts for each peak in each cell.
-
Perform dimensionality reduction and plot data in UMAP form. Cell-by-peak matrices are filtered by the number of cells reporting a fragment overlap, transformed by term frequency-inverse document frequency (TF-IDF) and log, subjected to Singular Value Decomposition (SVD) and most variable peaks selection, and plotted in UMAP two-dimensional space. We also generated heatmaps plotting transformed data for most variable peaks.