Note: this repository is here just for the sake of reproducing the analysis of our original Nature Communications publication. We now have a more updated version of the snakemake pipeline purely for the purpose of data processing. Please have a look at here: scATAC_snakemake. It has more flexible settings, and the output is mimicing the 10x Genomics
output, which can be put directly into various downstream scATAC-seq data analysis packages.
Here, I quote from the Dutch computer scientist Edsger W. Dijkstra:
Simplicity is prerequisite for reliability.
All steps are executed using the Snakefile
in the corresponding directories. To start the workflow, download the fastq files from ArrayExpress (E-MTAB-6714), and put them in the fastq directories under mSp_scATAC-seq/rep{1..11}/fastq/
and under other_cells_methods/*/fastq
. Then run the pipeline using snakemake with the Snakefile
provided, but change the path of certain files/programs in the Snakefile
such as picard.jar
according to your own environment.
For the processing of the ImmGen bulk ATAC-seq and the public Fluidigm C1 scATAC-seq experiments, download the raw data (url and study accession number provided in the correponding directories) into the fastq directory and run snakemake in the same way. Or, you can use stream_ena to avoid downloading fastq files to save space.
The notebooks hek293t_nih3t3_mix_analysis.ipynb
and technical_qc_and_methods_comprison.ipynb
contain basic information from the species mixing experiment, comparison of plate vs C1 using K562 and E14 mESC and experiments testd on other tissues and cells.
Follow the notebook for the data analysis:
1.mSp_exploratory_analysis_all.ipynb
contains information about the quality control and other basic information about all experiments.
2.mSp_cell_type_identification.ipynb
is the analysis for the identification of different cell types in the mouse spleen based on the scATAC-seq profiles.
3.mSp_motif_enrichment_analysis.ipynb
is used to generate the heatmap representation of known motifs enrichment by HOMER.
This repository only contains necessary files used to reproduce the analyses and figures. Raw data and any intermediate file are not included and can be generated from the Data processing
stage. The count matrix mSp_scATAC_count_matrix_over_all.mtx
is not in this repository, because it is too large. For now, download the count file from here: ftp://ngs.sanger.ac.uk/production/teichmann/xi/plate_scATAC-seq
, and put it under cmp_to_immgen/
, and the later analysis should run without any problem.
macs2 (v2.1.1.20160309) (this needs python2)
picard (v2.17.10)
# packages installed via conda:
#
# Name Version Build Channel
bedtools 2.27.1 1 bioconda
cutadapt 1.16 py36_1 bioconda
hisat2 2.1.0 py36pl5.22.0_0 bioconda
homer 4.9.1 pl5.22.0_5 bioconda
matplotlib 3.0.0 py36h45c993b_1 conda-forge
numpy 1.15.2 py36_blas_openblashd3ea46f_0 [blas_openblas] conda-forge
pandas 0.23.4 py36hf8a1672_0 conda-forge
salmon 0.9.1 1 bioconda
samtools 1.7 2 bioconda
scikit-learn 0.19.2 py36_blas_openblasha84fab4_201 [blas_openblas] conda-forge
scipy 1.1.0 py36_blas_openblash7943236_201 [blas_openblas] conda-forge
seaborn 0.9.0 py_0 conda-forge
seqtk 1.3 ha92aebf_0 bioconda
snakemake 5.3.0 py36_1 bioconda
You also need calc
, addCols
, bedClip
and bedGraphToBigWig
from UCSC utilities.
Finally, you need bdg2bw to convert the macs2 generated begraph to bigwig for visualisation.
- It is actually easier to do the experiment in 384-well plates if you have a 16-channel multi-channel pipette.
- When pooling amplified single-cell libraries, you can use an 8-channel multi-channel pipette to transfer an entire 384-well plate into 8 wells of a deep-well 96-well plate. Since all single-cell libraries are already indexed here, you don't need to change tips (like shown below). After that, you will end up with slightly more than 900 ul mix in each deep well. Then, use a P1000 Gilson to transfer those ~900 ul x 8 = ~7.2 ml libary mix into a 50 ml falcon tube.
- Altenatively, you can use a plate reservoir, if you feel pipetting is too much for you. We used this one from Clickbio. This does save a lot of effort by simple centrifugation, like this:
- After centrifugation (we did 1,000xg for 1 minute), transfer the ~7.2 ml library mix to a 50 ml falcon tube. Add 36 ml Buffer PB (5 volume). The full volume is too large for a Qiagen MinElute column, you have two choices. You can use the Exntender Tube from the Qiagen Maxiprep kit
Xi Chen
[email protected]