Various config files are provided with the pipeline. Some contain pipeline-specific parameters for use with Nextflow in order to increase the flexibility and portability of the pipeline, and others contain parameters required for specific processes.
Path | Used by | Description |
---|---|---|
nextflow.config |
nextflow |
Config file used for the definition of Nextflow profiles and other standard parameters. Dont need to edit this file. |
conf/genomes.config |
nextflow |
You can customise this file if you prefer to store the genome parameters required by the pipeline in a single file. Alternatively, you can provide each of the parameters at the command-line (See Reference genome section). |
conf/base.config |
nextflow |
Config file defining compute resources used by each process. If applicable, you can tailor the parameters in this file to suit your compute system. |
conf/conda.config |
nextflow |
Config file specifying parameters required to build Conda environment. If applicable, you will need to change the beforeScript parameter in this file to ensure the relevant Conda distribution is available on your system before each process is executed. |
conf/babs_modules.config |
nextflow |
Config file specifying environment modules that can be used at The Francis Crick Institute to run the pipeline. You can customise this file to use specific versions of software for each process in the pipeline or you can use it as a template to build your own for use outside The Francis Crick Institute. |
conf/fastq_screen.conf.txt |
FastQ Screen |
If you want the pipeline to run fastq_screen this file will need to be customised to reflect the paths to pre-created bowtie2 indices for the contaminant screen. |
conf/multiqc_config.yaml |
MultiQC |
Config file for generating customised MultiQC report for BABS-ATACSeqPE pipeline. You wont need to edit this file unless you want to change aspects of the QC that are reported by MultiQC. |
conf/bamtools_filter_pe.json |
BamTools |
Config file used by BamTools for read alignment filtering. You wont need to customise this file unless you want to amend specific aspects of the read filtering criteria. |
Depending on where and how you would like to run the pipeline, Nextflow needs to be configured to use the required software.
This is the easiest, most hassle-free and portable way of running the pipeline. A custom Conda environment file is provided with the pipeline for those that wish to run the pipeline on another system without having to painstakingly install all of the software and associated dependencies. This will require an internet connection on the command-line, and installation of Anaconda or Miniconda. Nextflow will rather amazingly create a temporary Conda environment by downloading and installing all of the required software before execution of the pipeline. NOTE: This could take up to 45 minutes.
A Conda config file can be found in the conf/
directory. By default, the pipeline will be executed using the slurm
job submission system. You will need an account to use the HPC cluster on CAMP if you want to run this at The Francis Crick Institute. If you prefer to run the pipeline locally in serial just replace executor = 'slurm'
to executor = 'local'
in the Conda config file. However, the pipeline may take a very long time to complete and this is only recommended for testing purposes. Before the submission of each Nextflow process the conda
command will need to be available on the command-line to activate the Conda environment. If you are not running the pipeline at The Francis Crick Institute you will need to edit beforeScript = 'module purge && ml Anaconda2/5.1.0'
in the config file to load/use Conda.
By default, Conda will download and compile the packages for the environment in the user HOME
directory, however, this has a space limitation of 5GB on CAMP which wont be large enough. To overcome this, you can create a file called .condarc
in your HOME
directory that instructs Conda to install the environment and associated packages in your working
directory. Just copy the text below, place it in the .condarc
file, and change the paths to reflect those that are most appropriate for you.
envs_dirs:
- /camp/stp/babs/working/patelh/conda/envs/
pkgs_dirs:
- /camp/stp/babs/working/patelh/conda/pkgs/
It is also possible to create a permanent copy of the Conda environment for recurrent use and Nextflow can be configured accordingly, however this wont be outlined here.
When running the pipeline just specify -profile conda
in order to use this configuration.
If you are running this pipeline at The Francis Crick Institute all of the required software can be loaded with the environment module system on CAMP. A module config file specifying the software modules to load has been been created for this purpose. If you want to change the versions of software just edit this file before running the pipeline, however, this may require some testing to make sure the Nextflow processes requiring the software can still be run with the same command-line parameters.
By default, the pipeline will be executed using the slurm
job submission system. You will need an account to use the HPC cluster on CAMP if you want to run this at The Francis Crick Institute. If you prefer to run the pipeline locally in serial just replace executor = 'slurm'
to executor = 'local'
in the module config file. However, the pipeline may take a very long time to complete and this is only recommended for testing purposes.
When running the pipeline just specify -profile babs_modules
in order to use this configuration.
If you really need to run the pipeline in serial you can specify -profile standard
when running the pipeline. This is the most basic profile and will require all the software to be available at the command-line. The processes will be submitted in serial on the logged in environment.
The software below is required to run the pipeline:
nextflow | cutadapt | Kent_tools |
R | BWA | MACS2 |
Python | picard | HOMER |
Java | SAMtools | featureCounts |
FastQC | BamTools | Pysam |
FastQ Screen | BEDTools | MultiQC |
The following R libraries may need to be installed if unavailable. You can test this be loading the R
module (if required), typing R
at the command prompt and attempting to load the packages below e.g. > library(optparse)
and so on. The pipeline assumes the correct R library path is set in order find the installed packages. If not, you can set this in the .Rprofile
file in the user home directory or add a line which extends the R
libPaths in the executable R scripts in the bin/
directory.
optparse | RColorBrewer |
ggplot2 | lattice |
reshape2 | UpSetR |
scales | DESeq2 |
pheatmap | vsn |
Standard linux tools including cut
, awk
, sort
, mv
, touch
, echo
, mkdir
, paste
, cp
, ln
, grep
are also used throughout the pipeline.