A pipeline is used to analyze SARS-CoV-2 sequencing data from PacBio equipment.
The pipeline can analyze SARS-CoV-2 sequencing data from PacBio equipment. The sample's sequencing quality, mapping/alignment with reference, coverage, SNPs, annotation, consensus sequence, lineage, phylogeny, etc can be analyzed.
Nextflow should be installed. The detail of installation can be found in https://github.com/nextflow-io/nextflow.
Python3 is needed.
Singularity/Apptainer is also needed. The detail of installation can be found in https://singularity-tutorial.github.io/01-installation/.
PacBio SMRTLINK stand-alone tools (lima, mimux, pbindex, pbmm2, VCFCons) are needed. About how to install them, please see the file "How_to_install_smrtlink_tools.txt".
- Add the reads file in bam format (*.bam) from PacBio equipment to the folder "hifireads"
- Add the file of M13 barcodes in fasta (*.fasta) format to the folder "M13barcodes".
- Add enrichment probes file in fasta format (*.fasta) to the folder "enrichment_probes".
- Add biosample sheet file in csv format (*.csv) to the folder "biosample_sheet". Only two columns are allowed in the sheet. One column is barcode name, the other column is sample name. A sample sheet can be found in the folder "biosample_sheet".
- open file "parames.yaml", set the full paths to the folders of input, probe, output, and reference.
- get into the top of the pipeline directory, then run
./daytonapb.sh
If SLURM is used, you can run
sbatch ./daytonapb.sh
All results can be found in the directory /output.