The Nextflow pipeline is used to analyze bacterial genome data from PacBio. It has similar functions with Sanibel pipeline, such as identifying clonal complex and serotype of Neisseria and H.influenzae species, AMR detection, identifying the species based on sequencing data, finding plasmid, and so on.
Nextflow is needed. The detail of installation can be found in https://github.com/nextflow-io/nextflow. For HiPerGator users, its installation is not needed.
Singularity/APPTAINER is needed. The detail of installation can be found in https://singularity-tutorial.github.io/01-installation/. For HiPerGator users, its installation is not needed.
SLURM is needed. For HiPerGator users, its installation is not needed.
Python3 is needed. The package "pandas" should be installed by pip3 install pandas
if not included in your python3.
PacBio SMRTLINK stand-alone tools are needed. About how to install them, please see the file "How_to_install_smrtlink_tools.txt" in the pipeline.
conda create -n SANIBELPB -c conda-forge python=3.10 pandas
conda activate SANIBELPB
- Rename your data files and make them looks like "bc2024bc2024.bam.pbi" and "bc2024bc2024.bam". You can use to the script "rename.sh" in the pipeline to rename your data files.
- put the renamed data files (*.bam and *.bam.pbi) into the directory /pbbams.
- open file "params.yaml", set the two parameters absolute paths. They should be ".../.../pbbams" and ".../.../output".
- get to the top directory of the pipeline, run
sbatch ./sanibel_pb.sh
By default, the pipeline uses singularity to run containers and is wrapped by SLURM. If you want to use docker to run the containers, you should use the command below:
sbatch ./sanibel_pb_docker.sh