A nextflow pipeline for Dengue NGS data analysis.
The pipeline can analyze NGS data of Dengue virus. The sample's serotype, sequencing quality, mapping/alignment with reference, coverage, SNPs, annotation, etc can be outputted.
Nextflow should be installed. The detail of installation can be found in https://github.com/nextflow-io/nextflow.
Python3 is needed.
Singularity/Apptainer is needed if you select singularity to run containers in the pipeline. The detail of installation can be found in https://singularity-tutorial.github.io/01-installation/.
Docker ( https://www.docker.com/ ) is needed if you select docker to run containers in the pipeline.
SLURM is needed if you plan use slurm to run the pipeline at HPC.
If the referene genomes donot have index, using a command below to generate their index files and to put them in the same directory before running the pipeline.
bwa index <full path to your genome fasta file>
Or
singularity exec docker://staphb/bwa:0.7.17 bwa index <full path to your genome fasta file>
Or
docker run staphb/bwa:0.7.17 bwa index <full path to your genome fasta file>
By default, the pipeline uses singularity to run containers and is wrapped by SLURM. To run the pipeline by default, you should follow the steps below.
- put your data files into directory /fastqs. Your data file's name should look like "JBS22002292_1.fastq.gz", "JBS22002292_2.fastq.gz". You may use the bash script "renamefile.sh" to rename your data file names.
- open file "parames.yaml", set the parameters.
- get into the top of the pipeline directory, then run
sbatch ./daytona_dengue.sh
Note: the sbatch parameters setting is based on our cluser HiPerGator at University of Florida campus. You maybe need change them according to your cluster's configuration.
- put your data files into directory /fastqs. Your data file's name should look like "JBS22002292_1.fastq.gz", "JBS22002292_2.fastq.gz". You may use the bash script "renamefile.sh" to rename your data file names.
- open file "parames.yaml", set the parameters.
- get into the top of the pipeline directory, then run
bash ./kraken2_viral.sh
nextflow run daytona_dengue.nf -params-file params.yaml -c ./configs/singularity.config
bash ./report_output.sh
- put your data files into directory /fastqs. Your data file's name should look like "JBS22002292_1.fastq.gz", "JBS22002292_2.fastq.gz". You may use the bash script "renamefile.sh" to rename your data file names.
- open file "parames.yaml", set the parameters.
- get into the top of the pipeline directory, then run
bash ./kraken2_viral.sh
nextflow run daytona_dengue.nf -params-file params.yaml -c ./configs/docker.config
bash ./report_output.sh
All results can be found in the directory /output. A summary report "final_report.txt" can be found in the directory /output/report.
- In the output file Serotypes.txt, a sample is considered as unserotypeed if its confident rate under 50%. This sample's exact taxon can be found in /output/kraken_out_broad.
- renamefile.sh can be used to change user's file name