Reference genome

The genome parameters required to run the pipeline are listed below:

Parameter	Description
`fasta`	Path to multi-fasta file containing reference genome assembly.
`gtf`	Path to GTF file containing gene annotation which is typically available for download with the reference assembly.
`mito_name`	Name of the mitochondrial contig in the fasta file e.g. 'chrM'. ATACSeq datasets usually contain a high percentage of reads mapping to mitochondrial DNA, and as a result these will be filtered out in the pipeline.
`bwa_index`	Path to BWA index for reference genome assembly. See Indexing genome section below.
`genome_mask`	Path to BED format file containing genomic regions to exclude from the analysis. See ENCODE blacklisted regions.
`macs_genome_size`	MACS2 genome size required by MACS2.

The parameters can either be specified at the command-line when running the pipeline

nextflow run main.nf --design design.csv --fasta <FASTA_FILE> --gtf <GTF_FILE> --mito_name <MITO_NAME> --bwa_index <BWA_INDEX> --genome_mask <GENOME_MASK> --macs_genome_size <MACS_GENOME_SIZE> -profile babs_modules

or you can edit the genomes.config file to define and store these parameters for multiple genome assemblies. Using this method you will only need to provide the specified shorthand name for the reference genome when running the pipeline.

nextflow run main.nf --design design.csv --genome hg19 -profile babs_modules

Indexing genome

The fasta file will need to be indexed with SAMtools and BWA before running the pipeline.

samtools faidx <FASTA_FILE>
bwa index <FASTA_FILE>

See BWA documentation for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

genome.md

genome.md

Reference genome

Indexing genome

Files

genome.md

Latest commit

History

genome.md

File metadata and controls

Reference genome

Indexing genome