Skip to content

ChIP Pipeline ReadMe

MikeWLloyd edited this page Apr 11, 2024 · 15 revisions

Chromatin Immunoprecipitation Sequencing (ChIP-Seq) Documentation

ChIP-seq Pipeline (--workflow chipseq)

•	Step 1: CSV design file check  
•	Step 2: Genome filter build, and fasta index file build  
•	Step 3: FastQC  
•	Step 4: Read trimming via Trim Galore  
•	Step 5: Compute read groups  
•	Step 6: BWA mem alignment  
•	Step 7: BAM filtering to remove unmapped reads, sort bam, and compute bam stats  
•	Step 8: Merge filtered BAMs across technical replicates  
•	Step 9: Duplicate reads marked  
•	Step 10: Additional BAM filtered to remove duplicate reads  
•	Step 11: If paired-end data: orphaned reads are filtered  
•	Step 12: If multiple samples are provided, Preseq is run to estimate library complexity   
•	Step 13: Genome coverage, Picard CollectMultiMetrics, and BigWig files are generated  
•	Step 14: Deeptools is used to compute a depth matrix for plotting, and plots are subsequently made  
•	Step 15: Phantom peak qual tools is run to compute ChIP-seq enrichment and quality measures  
•	Step 16: Deeptools PlotFingerprint  
•	Step 17: Macs2 peak calling  
•	Step 18: Fraction of reads in peak computed  
•	Step 19: Homer peak annotation  
•	Step 20: Peak QC and plots

If multiple samples per antibody:

•	Step 21: Consensus calling by antibody  
•	Step 22: Consensus peak Homer annotation   
•	Step 23: Subread FeatureCounts   
•	Step 24: DESEQ2  

All runs:

•	Step 25: MultiQC report generation    

ChIP-Seq Flowchart

flowchart TB

    subgraph CHIPSEQ
    v0((Sample))
    v1([CHECK_DESIGN])
    v4((  ))
    v7([SAMTOOLS_FAIDX])
    v9([MAKE_GENOME_FILTER])
    v10([FASTQC])
    v11([TRIM_GALORE])
    v15([BWA_MEM])
    v17([SAMTOOLS_FILTER])
    v20([SAMTOOLS_SORT])
    v21([SAMTOOLS_STATS])
    v25([PICARD_MERGESAMFILES])
    v26([PICARD_MARKDUPLICATES])
    v28([SAMTOOLS_STATS_MD])
    v29([SAMTOOLS_MERGEBAM_FILTER])
    v31([BAMTOOLS_FILTER])
    v32([SAMTOOLS_STATS_BF])
    v36([SAMTOOLS_STATS_FILTERED])
    v37([PRESEQ])
    v39([SAMTOOLS_INDEX])
    v40([PICARD_COLLECTMULTIPLEMETRICS])
    v43([BEDTOOLS_GENOMECOV])
    v45([UCSC_BEDGRAPHTOBIGWIG])
    v47([DEEPTOOLS_COMPUTEMATRIX])
    v49([DEEPTOOLS_PLOTPROFILE])
    v51([DEEPTOOLS_PLOTHEATMAP])
    v54([PHANTOMPEAKQUALTOOLS])
    v60([MULTIQC_CUSTOM_PHANTOMPEAKQUALTOOLS])
    v68([DEEPTOOLS_PLOTFINGERPRINT])
    v73([PEAK_CALLING_CHIPSEQ])
    v83([FRIP_SCORE])
    v86([HOMER_ANNOTATEPEAKS])
    v88([PLOT_MACS2_QC])
    v94([PLOT_HOMER_ANNOTATEPEAKS])
    v100([MACS2_CONSENSUS])
    v107([CONSENSUS_PEAKS_ANNOTATE])
    v109([ANNOTATE_BOOLEAN_PEAKS])
    v117([SUBREAD_FEATURECOUNTS])
    v120([DESEQ2_QC])
    v205([MULTIQC])
    v100(( ))
    v108(( ))
    end
    subgraph " "
    v27[" "]
    v33[" "]
    v34[" "]
    v35[" "]
    v38[" "]
    v41[" "]
    v44[" "]
    v48[" "]
    v50[" "]
    v52[" "]
    v53[" "]
    v55[" "]
    v69[" "]
    v70[" "]
    v74[" "]
    v75[" "]
    v76[" "]
    v77[" "]
    v89[" "]
    v90[" "]
    v95[" "]
    v96[" "]
    v101[" "]
    v102[" "]
    v103[" "]
    v104[" "]
    v110[" "]
    v121[" "]
    v122[" "]
    v123[" "]
    v124[" "]
    v125[" "]
    v126[" "]
    v127[" "]
    v206[" "]
    v207[" "]
    v208[" "]
    end
    v0 --> v1
    v1 --> v11
    v1 --> v10
    v1 --> v4

    v7 --> v9

    v9 --> v29
    v9 --> v45

    v10 --> v205
    %% v2 --> v11
    v11 --> v15
    v11 --> v205

    %% v14 --> v15
    v15 --> v17

    v17 --> v20

    v20 --> v21
    v20 --> v25
    v21 --> v205

    v25 --> v26
    v26 --> v28
    v26 --> v27
    v26 --> v29
    v26 --> v37
    v26 --> v205
    v28 --> v205
    v29 --> v31

    v31 --> v32
    v31 --> v36
    v31 --> v39
    v31 --> v40
    v31 --> v54
    v31 --> v4
    v31 --> v43
    v32 --> v35
    v32 --> v34
    v32 --> v33
    v36 --> v4
    v36 --> v43
    v36 --> v205
    v37 --> v38
    v37 --> v205
    v39 --> v4
    v40 --> v41
    v40 --> v205
    v43 --> v45
    v43 --> v44
    v45 --> v47

    v47 --> v49
    v47 --> v48
    v47 --> v51
    v49 --> v50
    v49 --> v205
    v51 --> v53
    v51 --> v52
    v54 --> v55
    v54 --> v60
    v54 --> v205


    v60 --> v205
    v4 --> v68
    v68 --> v70
    v68 --> v69
    v68 --> v205


    v4 --> v73
    v73 --> v77
    v73 --> v86
    v73 --> v76
    v73 --> v75
    v73 --> v74
    v73 --> v4
    v73 --> v88
    v73 --> v100

    v4 --> v83
    v83 --> v205


    v86 --> v94
    v88 --> v90
    v88 --> v89

    v94 --> v96
    v94 --> v95
    v94 --> v205
    v100 --> v104
    v100 --> v107
    v100 --> v103
    v100 --> v102
    v100 --> v101
    v100 --> v4
    v100 --> v108


    v107 --> v108
    v108 --> v109
    v109 --> v110
    v4 --> v117
    v117 --> v120
    v117 --> v205


    v120 --> v127
    v120 --> v126
    v120 --> v125
    v120 --> v124
    v120 --> v123
    v120 --> v122
    v120 --> v121
    v120 --> v205
    v205 --> v208
    v205 --> v207
    v205 --> v206
Loading

Parameters for ChIP-seq Pipeline

  • --pubdir

    • Default: /<PATH>
    • Comment: The directory that the saved outputs will be stored.
  • --organize_by

    • Default: sample
    • Comment: How to organize the output folder structure. Options: sample or analysis.
  • --cacheDir

    • Default: /projects/omics_share/meta/containers
    • Comment: This is directory that contains cached Singularity containers. JAX users should not change this parameter.
  • -w

    • Default: /<PATH>
    • Comment: The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
  • --sample_folder

    • Default: /<PATH>
    • Comment: The path to the folder that contains all the samples to be run by the pipeline. The files in this path can also be symbolic links.
  • --extension

    • Default: .fastq.gz
    • Comment: The expected extension for the input read files.
  • --pattern

    • Default: "*_R{1,2}*"
    • Comment: The expected R1 / R2 matching pattern. The default value will match reads with names like this READ_NAME_R1_MoreText.fastq.gz or READ_NAME_R1.fastq.gz
  • --read_type

    • Default: PE
    • Comment: Options: PE and SE. Default: PE. Type of reads: paired end (PE) or single end (SE).
  • --concat_lanes

    • Default: false
    • Comment: Options: false and true. Default: false. If this boolean is specified, FASTQ files will be concatenated by sample. Used in cases where samples are divided across individual sequencing lanes.
  • --csv_input

    • Default: null
    • Comment: Provide a CSV manifest file with the header: "sampleID,lane,fastq_1,fastq_2". See below for an example file. Fastq_2 is optional and used only in PE data. Fastq files can either be absolute paths to local files, or URLs to remote files. If remote URLs are provided, * --download_data can be specified.
  • --download_data

    • Default: null
    • Comment: Requires * --csv_input. When specified, read data in the CSV manifest will be downloaded from provided URLs with Aria2.
  • --gen_org

    • Default: mouse
    • Comment: Options: mouse and human.
  • --genome_build

    • Default: GRCm38
    • Comment: Mouse specific. Options: GRCm38 or GRCm39. If gen_org == human, build defaults to GRCh38.
  • --input

    • Default: null
    • Comment: Required. Input CSV file, see notes below for format.
  • --ref_fa

    • Default: '/projects/omics_share/mouse/GRCm38/genome/sequence/ensembl/v102/Mus_musculus.GRCm38.dna.toplevel.fa'
    • Comment: The reference fasta to be used throughout the process for alignment as well as any downstream analysis, points to human reference when * --gen_org human.
  • --ref_fa_indices

    • Default: '/projects/omics_share/mouse/GRCm38/genome/indices/ensembl/v102/bwa/Mus_musculus.GRCm38.dna.toplevel.fa'
    • Comment: Pre-compiled BWA index files, points to human reference when * --gen_org human. JAX users should not change this parameter.
  • --gtf

    • Default: '/projects/omics_share/mouse/GRCm38/transcriptome/annotation/ensembl/v102/Mus_musculus.GRCm38.102.gtf'
    • Comment: The full path to GTF file for annotating peaks. Ensembl GTF format required.
  • --gene_bed

    • Default: '/projects/omics_share/mouse/GRCm38/transcriptome/annotation/ensembl/v102/Mus_musculus.GRCm38.102.bed'
    • Comment: The full path to BED file for genome-wide gene intervals.
  • --fragment_size

    • Default: 200
    • Comment: Number of base pairs to extend single-end reads when creating bigWig files.
  • --fingerprint_bins

    • Default: 500000
    • Comment: Number of genomic bins to use when generating the deepTools fingerprint plot. Larger numbers will give a smoother profile, but take longer to run.
  • --macs_gsize

  • --blacklist

    • Default: ''
    • Comment: If provided, alignments that overlap with the regions in this file will be filtered out (see ENCODE blacklists). The file should be in BED format.
  • --trimLength

    • Default: 30
    • Comment: Discard reads that became shorter than length 'INT' because of either quality or adapter trimming. A value of 0 effectively disables this behavior.
  • --qualThreshold

    • Default: 30
    • Comment: Trim low-quality ends from reads in addition to adapter removal. Files are quality and adapter trimmed in a single pass.
  • --adapOverlap

    • Default: 1
    • Comment: Stringency for overlap with adapter sequence required to trim a sequence. Defaults to a very stringent setting of 1, i.e. a single base pair of overlapping sequence will be trimmed of the 3' end of any read.
  • --adaptorSeq

    • Default: 'AGATCGGAAGAGC'
    • Comment: Adapter sequence to be trimmed. This sequence is the standard Illumina adapter sequence.
  • --mismatch_penalty

    • Default: ''
    • Comment: The BWA penalty for a mismatch. Example required format if used: -B 4
  • --bwa_min_score

    • Default: false
    • Comment: Don’t output BWA MEM alignments with score lower than this parameter (Default: false)
  • --keep_dups

    • Default: false
    • Comment: Duplicate reads are not filtered from alignments (Default: false)
  • --keep_multi_map

    • Default: false
    • Comment: Reads mapping to multiple locations in the genome are not filtered from alignments (Default: false)
  • --bamtools_filter_pe_config

    • Default: $projectDir/bin/shared/bamtools/bamtools_filter_pe.json
    • Comment: The path to bamtools_filter_pe.json for paired end (PE). The configuration file used by bamtools filter
  • --bamtools_filter_se_config

    • Default: $projectDir/bin/shared/bamtools/bamtools_filter_se.json
    • Comment: The path to bamtools_filter_se.json for single end (SE). The configuration file used by bamtools filter
  • --narrow_peak

    • Default: false
    • Comment: MACS2 is run by default with the --broad flag. Specify this flag to call peaks in narrowPeak mode (Default: false)
  • --broad_cutoff

    • Default: 0.1
    • Comment: Specifies broad cut-off value for MACS2. Only used when * --narrow_peak isn't specified (Default: 0.1)
  • --macs_fdr

    • Default: false
    • Comment: Minimum FDR (q-value) cutoff for peak detection, * --macs_fdr and * --macs_pvalue are mutually exclusive (Default: false)
  • --macs_pvalue

    • Default: false
    • Comment: p-value cutoff for peak detection (Default: false).
  • --skip_preseq

    • Default: false
    • Comment: Skip Preseq
  • --skip_peak_qc

    • Default: false
    • Comment: Skip MACS2 peak QC plot generation (Default: false)
  • --skip_peak_annotation

    • Default: false
    • Comment: Skip MACS2 peak QC plot generation (Default: false)
  • --skip_consensus_peaks

    • Default: false
    • Comment: Skip consensus peak generation, annotation and counting (Default: false)
  • --skip_diff_analysis

    • Default: false
    • Comment: Skip differential binding analysis with DESeq2 (Default: false)
  • --deseq2_vst

    • Default: false
    • Comment: Use vst transformation instead of rlog with DESeq2. (Default: false)
  • --min_reps_consensus

    • Default: 1
    • Comment: Number of biological replicates required from a given condition for a peak to contribute to a consensus peak (Default: 1)
  • --save_macs_pileup

    • Default: false
    • Comment: Instruct MACS2 to create bedGraph files using the -B --SPMR parameters (Default: false).
  • --multiqc_config

    • Default: ${projectDir}/bin/shared/multiqc/chipseq.yaml
    • Comment: The path to the configuration file used by MultiQC

** Note: some of the above descriptions were taken from NF-Core ChIP-SEQ v1.2.2 Usage documentation

Read Type Note:

If read type is specified as paired-end (PE) when single end (SE) data are passed to the workflow, an error will result:

Argument of `file` function cannot be empty

 -- Check script '/projects/omics_share/meta/benchmarking/ngs-ops-nf-pipelines/./workflows/chipseq.nf' at line: 81 or see '.nextflow.log' file for more details

If the run is restarted with --read_type SE the error should resolve.

--input (this section taken from NF-core v.1.2.2)

You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 6 columns, and a header row as shown in the examples below.

--input '[path to design file]'

Multiple replicates

The group identifier should be identical when you have multiple replicates from the same experimental group, just increment the replicate identifier appropriately. The first replicate value for any given experimental group must be 1.

The antibody column is required to separate the downstream consensus peak merging and differential analysis for different antibodies. Its not advisable to generate a consensus peak set across different antibodies especially if their binding patterns are inherently different e.g. narrow transcription factors and broad histone marks.

The control column should be the group identifier for the controls for any given IP. The pipeline will automatically pair the inputs based on replicate identifier (i.e. where you have an equal number of replicates for your IP's and controls), alternatively, the first control sample in that group will be selected.

In the single-end design below there are triplicate samples for the WT_BCATENIN_IP group along with triplicate samples for their corresponding WT_INPUT samples.

group,replicate,fastq_1,fastq_2,antibody,control
WT_BCATENIN_IP,1,BLA203A1_S27_L006_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L002_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,3,BLA203A49_S40_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_INPUT,1,BLA203A6_S32_L006_R1_001.fastq.gz,,,
WT_INPUT,2,BLA203A30_S21_L002_R1_001.fastq.gz,,,
WT_INPUT,3,BLA203A31_S21_L003_R1_001.fastq.gz,,,

Multiple runs of the same library

Both the group and replicate identifiers should be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will perform the alignments in parallel, and subsequently merge them before further analysis. Below is an example where the second replicate of the WT_BCATENIN_IP and WT_INPUT groups has been re-sequenced multiple times:

group,replicate,fastq_1,fastq_2,antibody,control
WT_BCATENIN_IP,1,BLA203A1_S27_L006_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L002_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L003_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,3,BLA203A49_S40_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_INPUT,1,BLA203A6_S32_L006_R1_001.fastq.gz,,,
WT_INPUT,2,BLA203A30_S21_L001_R1_001.fastq.gz,,,
WT_INPUT,2,BLA203A30_S21_L002_R1_001.fastq.gz,,,
WT_INPUT,3,BLA203A31_S21_L003_R1_001.fastq.gz,,,

Full design

A final design file may look something like the one below. This is for two antibodies and associated controls in triplicate, where the second replicate of the WT_BCATENIN_IP and NAIVE_BCATENIN_IP group has been sequenced twice:

group,replicate,fastq_1,fastq_2,antibody,control
WT_BCATENIN_IP,1,BLA203A1_S27_L006_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L002_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,3,BLA203A49_S40_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
NAIVE_BCATENIN_IP,1,BLA203A7_S60_L001_R1_001.fastq.gz,,BCATENIN,NAIVE_INPUT
NAIVE_BCATENIN_IP,2,BLA203A43_S34_L001_R1_001.fastq.gz,,BCATENIN,NAIVE_INPUT
NAIVE_BCATENIN_IP,2,BLA203A43_S34_L002_R1_001.fastq.gz,,BCATENIN,NAIVE_INPUT
NAIVE_BCATENIN_IP,3,BLA203A64_S55_L001_R1_001.fastq.gz,,BCATENIN,NAIVE_INPUT
WT_TCF4_IP,1,BLA203A3_S29_L006_R1_001.fastq.gz,,TCF4,WT_INPUT
WT_TCF4_IP,2,BLA203A27_S18_L001_R1_001.fastq.gz,,TCF4,WT_INPUT
WT_TCF4_IP,3,BLA203A51_S42_L001_R1_001.fastq.gz,,TCF4,WT_INPUT
NAIVE_TCF4_IP,1,BLA203A9_S62_L001_R1_001.fastq.gz,,TCF4,NAIVE_INPUT
NAIVE_TCF4_IP,2,BLA203A45_S36_L001_R1_001.fastq.gz,,TCF4,NAIVE_INPUT
NAIVE_TCF4_IP,3,BLA203A66_S57_L001_R1_001.fastq.gz,,TCF4,NAIVE_INPUT
WT_INPUT,1,BLA203A6_S32_L006_R1_001.fastq.gz,,,
WT_INPUT,2,BLA203A30_S21_L001_R1_001.fastq.gz,,,
WT_INPUT,3,BLA203A31_S21_L003_R1_001.fastq.gz,,,
NAIVE_INPUT,1,BLA203A12_S3_L001_R1_001.fastq.gz,,,
NAIVE_INPUT,2,BLA203A48_S39_L001_R1_001.fastq.gz,,,
NAIVE_INPUT,3,BLA203A49_S1_L006_R1_001.fastq.gz,,,
Column Description
group Group/condition identifier for sample. This will be identical for re-sequenced libraries and replicate samples from the same experimental group.
replicate Integer representing replicate number. This will be identical for re-sequenced libraries. Must start from 1..<number of replicates>.
fastq_1 Full path to FastQ file for read 1. File has to be zipped and have the extension ".fastq.gz" or ".fq.gz".
fastq_2 Full path to FastQ file for read 2. File has to be zipped and have the extension ".fastq.gz" or ".fq.gz".
antibody Antibody name. This is required to segregate downstream analysis for different antibodies. Required when control is specified.
control Group identifier for control sample. The pipeline will automatically select the control sample with the same replicate identifier as the IP.

Pipeline Default Outputs

NOTE: * Represents a wild card that is a placeholder for values that will be filled by input file names and/or parameters when the pipeline is run.

NOTE: All files contained in 'stats' directories are captured by MultiQC reports.

The pipelines will output several directories relative to files that apply to individual samples, or consensus calling by antibody of samples.

The following summary of files assumes the following --input CSV file:

group,replicate,fastq_1,fastq_2,antibody,control
H3K4me1_T0,1,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF501VGT.fastq.gz,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF980OWC.fastq.gz,H3K4me1,H3K4me1_INPUT
H3K4me1_T0,2,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF318LVI.fastq.gz,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF188UNI.fastq.gz,H3K4me1,H3K4me1_INPUT
H3K4me1_INPUT,1,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF388YCF.fastq.gz,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF877ZJH.fastq.gz,,

Summaries of the adjusted fasta reference used (if blacklist is used), and study design are stored in:

genome_info

Naming Convention Description
*fasta.include_regions.bed Genome regions used in peak calling in bed format
*fasta.sizes Chromosome sizes used in peak calling
*fasta.fai Fasta index file used in peak calling

parsed_samplesheets

Naming Convention Description
design_reads.csv Parsed study design with sample reads
design_controls.csv Parsed study design, used in pairing samples for calling

Cross sample fastq quality results are found in:

fastqc

Naming Convention Description
SAMPLEID/stats/*fastqc.html HTML FastQC report from raw fastq
SAMPLEID/stats/*fastqc.zip FastqQC files in zip format from raw fastq
SAMPLEID/stats/*_val*_fastqc.html HTML FastQC report from trimmed fastq
SAMPLEID/stats/*_val*_fastqc.zip FastqQC files in zip format from trimmed fastq
SAMPLEID/trimmed_fastq/*trimming_report.txt Trim Galore trimming report

Individual sample results:

immuno_precip_samples

This directory is further divided into individual sample files, and results from Macs2 peak calling, and derived files from those peaks. For example:

Each sample (e.g., H3K4me1_T0_R1 [replicate 1], and H3K4me1_T0_R2 [replicate 2]) will have a set of files as follows:

Naming Convention Description
H3K4me1_T0_R1/bam/H3K4me1_T0_R1_dedup.bam Final filtered BAM file
H3K4me1_T0_R1/bigwig/H3K4me1_T0_R1.bigWig BigWig coverage file
H3K4me1_T0_R1/bigwig/H3K4me1_T0_R1.scale_factor.txt BigWig scaling factor file
H3K4me1_T0_R1/deeptools/H3K4me1_T0_R1.plotHeatmap.pdf Deeptools gene feature heatmap plot
H3K4me1_T0_R1/deeptools/H3K4me1_T0_R1.plotProfile.pdf Deeptools profile plot
H3K4me1_T0_R1/stats/* Collected QC metrics and statsistics. These are summarized across samples in the MultiQC report.

Each sample (e.g., H3K4me1_T0_R1 [replicate 1], and H3K4me1_T0_R2 [replicate 2]) will have a set of files derived from Macs2 peak calling against the assoacited INPUT for that sample (e.g., H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1, H3K4me1_T0_R2_vs_H3K4me1_INPUT_R1):

Naming Convention Description
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.{broad,narrow}Peak Macs2 broadPeak or narrowPeak depedning on settings used
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.xls Macs2 peaks in xls format
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.annotatePeaks.txt Homer annotated peak file
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.count_mqc.tsv Peak count file for MultiQC
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.FRiP_mqc.tsv Fraction of reads in peaks file for MultiQC
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/deeptools/H3K4me1_T0_R1.plotFingerprint.pdf Deeptools plotFingerprint plot
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/deeptools/H3K4me1_T0_R1.plotFingerprint.raw.txt Deeptools plotFingerprint data
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/deeptools/H3K4me1_T0_R1.plotFingerprint.qcmetrics.txt Deeptools plotFingerprint QC metrics

Additionally there are several cross sample QC plots provided in:

cross_sample_plots

Naming Convention Description
macs_annotatepeaks.plots.pdf Cross sample annotation summary plots
macs_annotatepeaks.summary.txt Cross sample annotation summary data
macs_peak.plots.pdf Cross sample QC metric summary plots (number of peaks, peak length, p-value distribution, etc.)
macs_peak.summary.txt Cross sample QC metric summary data (number of peaks, peak length, p-value distribution, etc.)

Consensus calling results:

consensusCalling_<ANTIBODY_NAME>

Naming Convention Description
deseq2/* DeSeq2 QC data, tables and plots. These are further summarized by MultiQC
macs2/*consensus_peaks.boolean.txt Consensus boolean peak file
macs2/*consensus_peaks.saf Consensus peaks in SAF format
macs2/*consensus_peaks.annotatePeaks.txt Annotated consensus boolean peak file
macs2/*consensus_peaks.bed Consensus boolean peak file in bed format
macs2/*consensus_peaks.antibody.txt Header summary file, used in Upset intersection plot generation
macs2/*consensus_peaks.boolean.intersect.txt Consensus peak intersection file used in Upset intersection plot generation
macs2/*consensus_peaks.boolean.intersect.plot.pdf Consensus peak Upset intersection plot
subread/*consensus_peaks.featureCounts.txt Subread feature counts file
subread/*consensus_peaks.featureCounts.txt.summary Subread feature counts summary file

Additional result output:

Naming Convention Description
chipseq_report.html Nextflow autogenerated report.
trace Nextflow autogenerated trace report for resource usage in tabular text format.
multiqc MultiQC report summarizing quality metrics across samples in the analysis run.

Pipeline Options Outputs

If the workflow is run with --keep_intermediate true additional outputs will be saved out. This option is only recommended for debugging purposes.

Clone this wiki locally