-
Notifications
You must be signed in to change notification settings - Fork 10
ChIP Pipeline ReadMe
• Step 1: CSV design file check
• Step 2: Genome filter build, and fasta index file build
• Step 3: FastQC
• Step 4: Read trimming via Trim Galore
• Step 5: Compute read groups
• Step 6: BWA mem alignment
• Step 7: BAM filtering to remove unmapped reads, sort bam, and compute bam stats
• Step 8: Merge filtered BAMs across technical replicates
• Step 9: Duplicate reads marked
• Step 10: Additional BAM filtered to remove duplicate reads
• Step 11: If paired-end data: orphaned reads are filtered
• Step 12: If multiple samples are provided, Preseq is run to estimate library complexity
• Step 13: Genome coverage, Picard CollectMultiMetrics, and BigWig files are generated
• Step 14: Deeptools is used to compute a depth matrix for plotting, and plots are subsequently made
• Step 15: Phantom peak qual tools is run to compute ChIP-seq enrichment and quality measures
• Step 16: Deeptools PlotFingerprint
• Step 17: Macs2 peak calling
• Step 18: Fraction of reads in peak computed
• Step 19: Homer peak annotation
• Step 20: Peak QC and plots
If multiple samples per antibody:
• Step 21: Consensus calling by antibody
• Step 22: Consensus peak Homer annotation
• Step 23: Subread FeatureCounts
• Step 24: DESEQ2
All runs:
• Step 25: MultiQC report generation
flowchart TB
subgraph CHIPSEQ
v0((Sample))
v1([CHECK_DESIGN])
v4(( ))
v7([SAMTOOLS_FAIDX])
v9([MAKE_GENOME_FILTER])
v10([FASTQC])
v11([TRIM_GALORE])
v15([BWA_MEM])
v17([SAMTOOLS_FILTER])
v20([SAMTOOLS_SORT])
v21([SAMTOOLS_STATS])
v25([PICARD_MERGESAMFILES])
v26([PICARD_MARKDUPLICATES])
v28([SAMTOOLS_STATS_MD])
v29([SAMTOOLS_MERGEBAM_FILTER])
v31([BAMTOOLS_FILTER])
v32([SAMTOOLS_STATS_BF])
v36([SAMTOOLS_STATS_FILTERED])
v37([PRESEQ])
v39([SAMTOOLS_INDEX])
v40([PICARD_COLLECTMULTIPLEMETRICS])
v43([BEDTOOLS_GENOMECOV])
v45([UCSC_BEDGRAPHTOBIGWIG])
v47([DEEPTOOLS_COMPUTEMATRIX])
v49([DEEPTOOLS_PLOTPROFILE])
v51([DEEPTOOLS_PLOTHEATMAP])
v54([PHANTOMPEAKQUALTOOLS])
v60([MULTIQC_CUSTOM_PHANTOMPEAKQUALTOOLS])
v68([DEEPTOOLS_PLOTFINGERPRINT])
v73([PEAK_CALLING_CHIPSEQ])
v83([FRIP_SCORE])
v86([HOMER_ANNOTATEPEAKS])
v88([PLOT_MACS2_QC])
v94([PLOT_HOMER_ANNOTATEPEAKS])
v100([MACS2_CONSENSUS])
v107([CONSENSUS_PEAKS_ANNOTATE])
v109([ANNOTATE_BOOLEAN_PEAKS])
v117([SUBREAD_FEATURECOUNTS])
v120([DESEQ2_QC])
v205([MULTIQC])
v100(( ))
v108(( ))
end
subgraph " "
v27[" "]
v33[" "]
v34[" "]
v35[" "]
v38[" "]
v41[" "]
v44[" "]
v48[" "]
v50[" "]
v52[" "]
v53[" "]
v55[" "]
v69[" "]
v70[" "]
v74[" "]
v75[" "]
v76[" "]
v77[" "]
v89[" "]
v90[" "]
v95[" "]
v96[" "]
v101[" "]
v102[" "]
v103[" "]
v104[" "]
v110[" "]
v121[" "]
v122[" "]
v123[" "]
v124[" "]
v125[" "]
v126[" "]
v127[" "]
v206[" "]
v207[" "]
v208[" "]
end
v0 --> v1
v1 --> v11
v1 --> v10
v1 --> v4
v7 --> v9
v9 --> v29
v9 --> v45
v10 --> v205
%% v2 --> v11
v11 --> v15
v11 --> v205
%% v14 --> v15
v15 --> v17
v17 --> v20
v20 --> v21
v20 --> v25
v21 --> v205
v25 --> v26
v26 --> v28
v26 --> v27
v26 --> v29
v26 --> v37
v26 --> v205
v28 --> v205
v29 --> v31
v31 --> v32
v31 --> v36
v31 --> v39
v31 --> v40
v31 --> v54
v31 --> v4
v31 --> v43
v32 --> v35
v32 --> v34
v32 --> v33
v36 --> v4
v36 --> v43
v36 --> v205
v37 --> v38
v37 --> v205
v39 --> v4
v40 --> v41
v40 --> v205
v43 --> v45
v43 --> v44
v45 --> v47
v47 --> v49
v47 --> v48
v47 --> v51
v49 --> v50
v49 --> v205
v51 --> v53
v51 --> v52
v54 --> v55
v54 --> v60
v54 --> v205
v60 --> v205
v4 --> v68
v68 --> v70
v68 --> v69
v68 --> v205
v4 --> v73
v73 --> v77
v73 --> v86
v73 --> v76
v73 --> v75
v73 --> v74
v73 --> v4
v73 --> v88
v73 --> v100
v4 --> v83
v83 --> v205
v86 --> v94
v88 --> v90
v88 --> v89
v94 --> v96
v94 --> v95
v94 --> v205
v100 --> v104
v100 --> v107
v100 --> v103
v100 --> v102
v100 --> v101
v100 --> v4
v100 --> v108
v107 --> v108
v108 --> v109
v109 --> v110
v4 --> v117
v117 --> v120
v117 --> v205
v120 --> v127
v120 --> v126
v120 --> v125
v120 --> v124
v120 --> v123
v120 --> v122
v120 --> v121
v120 --> v205
v205 --> v208
v205 --> v207
v205 --> v206
-
--pubdir
- Default:
/<PATH>
- Comment: The directory that the saved outputs will be stored.
- Default:
-
--organize_by
- Default:
sample
- Comment: How to organize the output folder structure. Options: sample or analysis.
- Default:
-
--cacheDir
- Default:
/projects/omics_share/meta/containers
- Comment: This is directory that contains cached Singularity containers. JAX users should not change this parameter.
- Default:
-
-w
- Default:
/<PATH>
- Comment: The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on /fastscratch or other directory with ample storage.
- Default:
-
--sample_folder
- Default:
/<PATH>
- Comment: The path to the folder that contains all the samples to be run by the pipeline. The files in this path can also be symbolic links.
- Default:
-
--extension
- Default:
.fastq.gz
- Comment: The expected extension for the input read files.
- Default:
-
--pattern
- Default:
"*_R{1,2}*"
- Comment: The expected R1 / R2 matching pattern. The default value will match reads with names like this
READ_NAME_R1_MoreText.fastq.gz
orREAD_NAME_R1.fastq.gz
- Default:
-
--read_type
- Default:
PE
- Comment: Options:
PE
andSE
. Default:PE
. Type of reads: paired end (PE) or single end (SE).
- Default:
-
--concat_lanes
- Default:
false
- Comment: Options:
false
andtrue
. Default:false
. If this boolean is specified, FASTQ files will be concatenated by sample. Used in cases where samples are divided across individual sequencing lanes.
- Default:
-
--csv_input
- Default: null
- Comment: Provide a CSV manifest file with the header: "sampleID,lane,fastq_1,fastq_2". See below for an example file. Fastq_2 is optional and used only in PE data. Fastq files can either be absolute paths to local files, or URLs to remote files. If remote URLs are provided, *
--download_data
can be specified.
-
--download_data
- Default: null
- Comment: Requires *
--csv_input
. When specified, read data in the CSV manifest will be downloaded from provided URLs with Aria2.
-
--gen_org
- Default:
mouse
- Comment: Options:
mouse
andhuman
.
- Default:
-
--genome_build
- Default:
GRCm38
- Comment: Mouse specific. Options: GRCm38 or GRCm39. If gen_org == human, build defaults to GRCh38.
- Default:
-
--input
- Default:
null
- Comment: Required. Input CSV file, see notes below for format.
- Default:
-
--ref_fa
- Default:
'/projects/omics_share/mouse/GRCm38/genome/sequence/ensembl/v102/Mus_musculus.GRCm38.dna.toplevel.fa'
- Comment: The reference fasta to be used throughout the process for alignment as well as any downstream analysis, points to human reference when *
--gen_org human
.
- Default:
-
--ref_fa_indices
- Default:
'/projects/omics_share/mouse/GRCm38/genome/indices/ensembl/v102/bwa/Mus_musculus.GRCm38.dna.toplevel.fa'
- Comment: Pre-compiled BWA index files, points to human reference when *
--gen_org human
. JAX users should not change this parameter.
- Default:
-
--gtf
- Default:
'/projects/omics_share/mouse/GRCm38/transcriptome/annotation/ensembl/v102/Mus_musculus.GRCm38.102.gtf'
- Comment: The full path to GTF file for annotating peaks. Ensembl GTF format required.
- Default:
-
--gene_bed
- Default:
'/projects/omics_share/mouse/GRCm38/transcriptome/annotation/ensembl/v102/Mus_musculus.GRCm38.102.bed'
- Comment: The full path to BED file for genome-wide gene intervals.
- Default:
-
--fragment_size
- Default:
200
- Comment: Number of base pairs to extend single-end reads when creating bigWig files.
- Default:
-
--fingerprint_bins
- Default:
500000
- Comment: Number of genomic bins to use when generating the deepTools fingerprint plot. Larger numbers will give a smoother profile, but take longer to run.
- Default:
-
--macs_gsize
- Default:
2725537669
- Comment: Effective genome size parameter required by MACS2.
- Default:
-
--blacklist
- Default:
''
- Comment: If provided, alignments that overlap with the regions in this file will be filtered out (see ENCODE blacklists). The file should be in BED format.
- Default:
-
--trimLength
- Default:
30
- Comment: Discard reads that became shorter than length 'INT' because of either quality or adapter trimming. A value of 0 effectively disables this behavior.
- Default:
-
--qualThreshold
- Default:
30
- Comment: Trim low-quality ends from reads in addition to adapter removal. Files are quality and adapter trimmed in a single pass.
- Default:
-
--adapOverlap
- Default:
1
- Comment: Stringency for overlap with adapter sequence required to trim a sequence. Defaults to a very stringent setting of 1, i.e. a single base pair of overlapping sequence will be trimmed of the 3' end of any read.
- Default:
-
--adaptorSeq
- Default:
'AGATCGGAAGAGC'
- Comment: Adapter sequence to be trimmed. This sequence is the standard Illumina adapter sequence.
- Default:
-
--mismatch_penalty
- Default: ''
- Comment: The BWA penalty for a mismatch. Example required format if used:
-B 4
-
--bwa_min_score
- Default:
false
- Comment: Don’t output BWA MEM alignments with score lower than this parameter (Default: false)
- Default:
-
--keep_dups
- Default:
false
- Comment: Duplicate reads are not filtered from alignments (Default: false)
- Default:
-
--keep_multi_map
- Default:
false
- Comment: Reads mapping to multiple locations in the genome are not filtered from alignments (Default: false)
- Default:
-
--bamtools_filter_pe_config
- Default:
$projectDir/bin/shared/bamtools/bamtools_filter_pe.json
- Comment: The path to bamtools_filter_pe.json for paired end (PE). The configuration file used by bamtools filter
- Default:
-
--bamtools_filter_se_config
- Default:
$projectDir/bin/shared/bamtools/bamtools_filter_se.json
- Comment: The path to bamtools_filter_se.json for single end (SE). The configuration file used by bamtools filter
- Default:
-
--narrow_peak
- Default:
false
- Comment: MACS2 is run by default with the --broad flag. Specify this flag to call peaks in narrowPeak mode (Default: false)
- Default:
-
--broad_cutoff
- Default:
0.1
- Comment: Specifies broad cut-off value for MACS2. Only used when *
--narrow_peak
isn't specified (Default: 0.1)
- Default:
-
--macs_fdr
- Default:
false
- Comment: Minimum FDR (q-value) cutoff for peak detection, *
--macs_fdr
and *--macs_pvalue
are mutually exclusive (Default: false)
- Default:
-
--macs_pvalue
- Default: false
- Comment: p-value cutoff for peak detection (Default: false).
-
--skip_preseq
- Default:
false
- Comment: Skip Preseq
- Default:
-
--skip_peak_qc
- Default:
false
- Comment: Skip MACS2 peak QC plot generation (Default: false)
- Default:
-
--skip_peak_annotation
- Default:
false
- Comment: Skip MACS2 peak QC plot generation (Default: false)
- Default:
-
--skip_consensus_peaks
- Default:
false
- Comment: Skip consensus peak generation, annotation and counting (Default: false)
- Default:
-
--skip_diff_analysis
- Default:
false
- Comment: Skip differential binding analysis with DESeq2 (Default: false)
- Default:
-
--deseq2_vst
- Default:
false
- Comment: Use vst transformation instead of rlog with DESeq2. (Default: false)
- Default:
-
--min_reps_consensus
- Default:
1
- Comment: Number of biological replicates required from a given condition for a peak to contribute to a consensus peak (Default: 1)
- Default:
-
--save_macs_pileup
- Default:
false
- Comment: Instruct MACS2 to create bedGraph files using the -B --SPMR parameters (Default: false).
- Default:
-
--multiqc_config
- Default:
${projectDir}/bin/shared/multiqc/chipseq.yaml
- Comment: The path to the configuration file used by MultiQC
- Default:
** Note: some of the above descriptions were taken from NF-Core ChIP-SEQ v1.2.2 Usage documentation
If read type is specified as paired-end (PE) when single end (SE) data are passed to the workflow, an error will result:
Argument of `file` function cannot be empty
-- Check script '/projects/omics_share/meta/benchmarking/ngs-ops-nf-pipelines/./workflows/chipseq.nf' at line: 81 or see '.nextflow.log' file for more details
If the run is restarted with --read_type SE
the error should resolve.
--input
(this section taken from NF-core v.1.2.2)
You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 6 columns, and a header row as shown in the examples below.
--input '[path to design file]'
The group
identifier should be identical when you have multiple replicates from the same experimental group, just increment the replicate
identifier appropriately. The first replicate value for any given experimental group must be 1.
The antibody
column is required to separate the downstream consensus peak merging and differential analysis for different antibodies. Its not advisable to generate a consensus peak set across different antibodies especially if their binding patterns are inherently different e.g. narrow transcription factors and broad histone marks.
The control
column should be the group
identifier for the controls for any given IP. The pipeline will automatically pair the inputs based on replicate identifier (i.e. where you have an equal number of replicates for your IP's and controls), alternatively, the first control sample in that group will be selected.
In the single-end design below there are triplicate samples for the WT_BCATENIN_IP
group along with triplicate samples for their corresponding WT_INPUT
samples.
group,replicate,fastq_1,fastq_2,antibody,control
WT_BCATENIN_IP,1,BLA203A1_S27_L006_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L002_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,3,BLA203A49_S40_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_INPUT,1,BLA203A6_S32_L006_R1_001.fastq.gz,,,
WT_INPUT,2,BLA203A30_S21_L002_R1_001.fastq.gz,,,
WT_INPUT,3,BLA203A31_S21_L003_R1_001.fastq.gz,,,
Both the group
and replicate
identifiers should be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will perform the alignments in parallel, and subsequently merge them before further analysis. Below is an example where the second replicate of the WT_BCATENIN_IP
and WT_INPUT
groups has been re-sequenced multiple times:
group,replicate,fastq_1,fastq_2,antibody,control
WT_BCATENIN_IP,1,BLA203A1_S27_L006_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L002_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L003_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,3,BLA203A49_S40_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_INPUT,1,BLA203A6_S32_L006_R1_001.fastq.gz,,,
WT_INPUT,2,BLA203A30_S21_L001_R1_001.fastq.gz,,,
WT_INPUT,2,BLA203A30_S21_L002_R1_001.fastq.gz,,,
WT_INPUT,3,BLA203A31_S21_L003_R1_001.fastq.gz,,,
A final design file may look something like the one below. This is for two antibodies and associated controls in triplicate, where the second replicate of the WT_BCATENIN_IP
and NAIVE_BCATENIN_IP
group has been sequenced twice:
group,replicate,fastq_1,fastq_2,antibody,control
WT_BCATENIN_IP,1,BLA203A1_S27_L006_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,2,BLA203A25_S16_L002_R1_001.fastq.gz,,BCATENIN,WT_INPUT
WT_BCATENIN_IP,3,BLA203A49_S40_L001_R1_001.fastq.gz,,BCATENIN,WT_INPUT
NAIVE_BCATENIN_IP,1,BLA203A7_S60_L001_R1_001.fastq.gz,,BCATENIN,NAIVE_INPUT
NAIVE_BCATENIN_IP,2,BLA203A43_S34_L001_R1_001.fastq.gz,,BCATENIN,NAIVE_INPUT
NAIVE_BCATENIN_IP,2,BLA203A43_S34_L002_R1_001.fastq.gz,,BCATENIN,NAIVE_INPUT
NAIVE_BCATENIN_IP,3,BLA203A64_S55_L001_R1_001.fastq.gz,,BCATENIN,NAIVE_INPUT
WT_TCF4_IP,1,BLA203A3_S29_L006_R1_001.fastq.gz,,TCF4,WT_INPUT
WT_TCF4_IP,2,BLA203A27_S18_L001_R1_001.fastq.gz,,TCF4,WT_INPUT
WT_TCF4_IP,3,BLA203A51_S42_L001_R1_001.fastq.gz,,TCF4,WT_INPUT
NAIVE_TCF4_IP,1,BLA203A9_S62_L001_R1_001.fastq.gz,,TCF4,NAIVE_INPUT
NAIVE_TCF4_IP,2,BLA203A45_S36_L001_R1_001.fastq.gz,,TCF4,NAIVE_INPUT
NAIVE_TCF4_IP,3,BLA203A66_S57_L001_R1_001.fastq.gz,,TCF4,NAIVE_INPUT
WT_INPUT,1,BLA203A6_S32_L006_R1_001.fastq.gz,,,
WT_INPUT,2,BLA203A30_S21_L001_R1_001.fastq.gz,,,
WT_INPUT,3,BLA203A31_S21_L003_R1_001.fastq.gz,,,
NAIVE_INPUT,1,BLA203A12_S3_L001_R1_001.fastq.gz,,,
NAIVE_INPUT,2,BLA203A48_S39_L001_R1_001.fastq.gz,,,
NAIVE_INPUT,3,BLA203A49_S1_L006_R1_001.fastq.gz,,,
Column | Description |
---|---|
group |
Group/condition identifier for sample. This will be identical for re-sequenced libraries and replicate samples from the same experimental group. |
replicate |
Integer representing replicate number. This will be identical for re-sequenced libraries. Must start from 1..<number of replicates> . |
fastq_1 |
Full path to FastQ file for read 1. File has to be zipped and have the extension ".fastq.gz" or ".fq.gz". |
fastq_2 |
Full path to FastQ file for read 2. File has to be zipped and have the extension ".fastq.gz" or ".fq.gz". |
antibody |
Antibody name. This is required to segregate downstream analysis for different antibodies. Required when control is specified. |
control |
Group identifier for control sample. The pipeline will automatically select the control sample with the same replicate identifier as the IP. |
NOTE: *
Represents a wild card that is a placeholder for values that will be filled by input file names and/or parameters when the pipeline is run.
NOTE: All files contained in 'stats' directories are captured by MultiQC
reports.
The pipelines will output several directories relative to files that apply to individual samples, or consensus calling by antibody of samples.
The following summary of files assumes the following --input
CSV file:
group,replicate,fastq_1,fastq_2,antibody,control
H3K4me1_T0,1,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF501VGT.fastq.gz,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF980OWC.fastq.gz,H3K4me1,H3K4me1_INPUT
H3K4me1_T0,2,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF318LVI.fastq.gz,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF188UNI.fastq.gz,H3K4me1,H3K4me1_INPUT
H3K4me1_INPUT,1,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF388YCF.fastq.gz,/projects/compsci/omics_share/human/GRCh38/supporting_files/benchmarking_data/CHIP/raw_reads/ENCFF877ZJH.fastq.gz,,
Summaries of the adjusted fasta reference used (if blacklist is used), and study design are stored in:
Naming Convention | Description |
---|---|
*fasta.include_regions.bed |
Genome regions used in peak calling in bed format |
*fasta.sizes |
Chromosome sizes used in peak calling |
*fasta.fai |
Fasta index file used in peak calling |
Naming Convention | Description |
---|---|
design_reads.csv |
Parsed study design with sample reads |
design_controls.csv |
Parsed study design, used in pairing samples for calling |
Naming Convention | Description |
---|---|
SAMPLEID/stats/*fastqc.html |
HTML FastQC report from raw fastq |
SAMPLEID/stats/*fastqc.zip |
FastqQC files in zip format from raw fastq |
SAMPLEID/stats/*_val*_fastqc.html |
HTML FastQC report from trimmed fastq |
SAMPLEID/stats/*_val*_fastqc.zip |
FastqQC files in zip format from trimmed fastq |
SAMPLEID/trimmed_fastq/*trimming_report.txt |
Trim Galore trimming report |
This directory is further divided into individual sample files, and results from Macs2 peak calling, and derived files from those peaks. For example:
Each sample (e.g., H3K4me1_T0_R1 [replicate 1], and H3K4me1_T0_R2 [replicate 2]) will have a set of files as follows:
Naming Convention | Description |
---|---|
H3K4me1_T0_R1/bam/H3K4me1_T0_R1_dedup.bam |
Final filtered BAM file |
H3K4me1_T0_R1/bigwig/H3K4me1_T0_R1.bigWig |
BigWig coverage file |
H3K4me1_T0_R1/bigwig/H3K4me1_T0_R1.scale_factor.txt |
BigWig scaling factor file |
H3K4me1_T0_R1/deeptools/H3K4me1_T0_R1.plotHeatmap.pdf |
Deeptools gene feature heatmap plot |
H3K4me1_T0_R1/deeptools/H3K4me1_T0_R1.plotProfile.pdf |
Deeptools profile plot |
H3K4me1_T0_R1/stats/* |
Collected QC metrics and statsistics. These are summarized across samples in the MultiQC report. |
Each sample (e.g., H3K4me1_T0_R1 [replicate 1], and H3K4me1_T0_R2 [replicate 2]) will have a set of files derived from Macs2 peak calling against the assoacited INPUT
for that sample (e.g., H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1, H3K4me1_T0_R2_vs_H3K4me1_INPUT_R1):
Naming Convention | Description |
---|---|
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.{broad,narrow}Peak |
Macs2 broadPeak or narrowPeak depedning on settings used |
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.xls |
Macs2 peaks in xls format |
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.annotatePeaks.txt |
Homer annotated peak file |
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.count_mqc.tsv |
Peak count file for MultiQC |
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/macs2/H3K4me1_T0_R1_peaks.FRiP_mqc.tsv |
Fraction of reads in peaks file for MultiQC |
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/deeptools/H3K4me1_T0_R1.plotFingerprint.pdf |
Deeptools plotFingerprint plot |
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/deeptools/H3K4me1_T0_R1.plotFingerprint.raw.txt |
Deeptools plotFingerprint data |
H3K4me1_T0_R1_vs_H3K4me1_INPUT_R1/deeptools/H3K4me1_T0_R1.plotFingerprint.qcmetrics.txt |
Deeptools plotFingerprint QC metrics |
Naming Convention | Description |
---|---|
macs_annotatepeaks.plots.pdf |
Cross sample annotation summary plots |
macs_annotatepeaks.summary.txt |
Cross sample annotation summary data |
macs_peak.plots.pdf |
Cross sample QC metric summary plots (number of peaks, peak length, p-value distribution, etc.) |
macs_peak.summary.txt |
Cross sample QC metric summary data (number of peaks, peak length, p-value distribution, etc.) |
consensusCalling_<ANTIBODY_NAME>
Naming Convention | Description |
---|---|
deseq2/* |
DeSeq2 QC data, tables and plots. These are further summarized by MultiQC |
macs2/*consensus_peaks.boolean.txt |
Consensus boolean peak file |
macs2/*consensus_peaks.saf |
Consensus peaks in SAF format |
macs2/*consensus_peaks.annotatePeaks.txt |
Annotated consensus boolean peak file |
macs2/*consensus_peaks.bed |
Consensus boolean peak file in bed format |
macs2/*consensus_peaks.antibody.txt |
Header summary file, used in Upset intersection plot generation |
macs2/*consensus_peaks.boolean.intersect.txt |
Consensus peak intersection file used in Upset intersection plot generation |
macs2/*consensus_peaks.boolean.intersect.plot.pdf |
Consensus peak Upset intersection plot |
subread/*consensus_peaks.featureCounts.txt |
Subread feature counts file |
subread/*consensus_peaks.featureCounts.txt.summary |
Subread feature counts summary file |
Naming Convention | Description |
---|---|
chipseq_report.html |
Nextflow autogenerated report. |
trace |
Nextflow autogenerated trace report for resource usage in tabular text format. |
multiqc |
MultiQC report summarizing quality metrics across samples in the analysis run. |
If the workflow is run with --keep_intermediate true
additional outputs will be saved out. This option is only recommended for debugging purposes.