Skip to content

Latest commit

 

History

History
executable file
·
36 lines (25 loc) · 2.93 KB

genome.md

File metadata and controls

executable file
·
36 lines (25 loc) · 2.93 KB

Reference genome

The genome parameters required to run the pipeline are listed below:

Parameter Description
fasta Path to multi-fasta file containing reference genome assembly.
gtf Path to GTF file containing gene annotation which is typically available for download with the reference assembly.
mito_name Name of the mitochondrial contig in the fasta file e.g. 'chrM'. ATACSeq datasets usually contain a high percentage of reads mapping to mitochondrial DNA, and as a result these will be filtered out in the pipeline.
bwa_index Path to BWA index for reference genome assembly. See Indexing genome section below.
genome_mask Path to BED format file containing genomic regions to exclude from the analysis. See ENCODE blacklisted regions.
macs_genome_size MACS2 genome size required by MACS2.

The parameters can either be specified at the command-line when running the pipeline

nextflow run main.nf --design design.csv --fasta <FASTA_FILE> --gtf <GTF_FILE> --mito_name <MITO_NAME> --bwa_index <BWA_INDEX> --genome_mask <GENOME_MASK> --macs_genome_size <MACS_GENOME_SIZE> -profile babs_modules

or you can edit the genomes.config file to define and store these parameters for multiple genome assemblies. Using this method you will only need to provide the specified shorthand name for the reference genome when running the pipeline.

nextflow run main.nf --design design.csv --genome hg19 -profile babs_modules

Indexing genome

The fasta file will need to be indexed with SAMtools and BWA before running the pipeline.

samtools faidx <FASTA_FILE>
bwa index <FASTA_FILE>

See BWA documentation for more information.