Skip to content

kyleLesack/pacbio_read_order_shuffling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

These scripts were used to evaluate the impact of FASTQ file read order on structural variation (SV) calling from long-read DNA sequencing data.

Requirements

Note: The yaml directory includes conda yaml environment files. I recommend using conda to install the required software using these files. The required tools and the versions we tested are listed below:

Data Not Included Here

Several files were too large to include here and need to be downloaded or created.

  • The blasr sawriter command was used to generate suffix array files and NGMLR indexes that were required for pbmm2/pbsv
    • The Snakemake pipeline expects them to be in the following directories:
      • 0_reference_includes/reference/c_elegans.PRJNA13758.WS263.genomic.fa-enc.2.ngm
      • 0_reference_includes/reference/c_elegans.PRJNA13758.WS263.genomic.fa-ht-13-2.2.ngm
      • 0_reference_includes/reference/c_elegans.PRJNA13758.WS263.genomic.fa.sa
      • arabidopsis/0_reference_includes/reference/GCF_000001735.4_TAIR10.1_genomic.fa
      • arabidopsis/0_reference_includes/reference/GCF_000001735.4_TAIR10.1_genomic.fa-enc.2.ngm
      • arabidopsis/0_reference_includes/reference/GCF_000001735.4_TAIR10.1_genomic.fa-ht-13-2.2.ngm
  • C. elegans PacBio sequencing data
  • A. thaliana sequencing data are available from BioProject PRJNA779205

Snakemake pipeline

The Snakefile contains the code to run the pipeline. A second Snakefile contains the instructions required to perform the subsampling steps. The rule resources are based on our high performance computing cluster and may need to be optimized for other systems.

The pipeline expects the input FASTQ files in the following locations:

1_fq_processing/N2/original/N2_original.fastq 1_fq_processing/JU1400/original/JU1400_original.fastq 1_fq_processing/NIC2/original/NIC2_original.fastq 1_fq_processing/JU2526/original/JU2526_original.fastq 1_fq_processing/XZ1516/original/XZ1516_original.fastq 1_fq_processing/MY2693/original/MY2693_original.fastq 1_fq_processing/QX1794/original/QX1794_original.fastq 1_fq_processing/NIC526/original/NIC526_original.fastq 1_fq_processing/DL238/original/DL238_original.fastq 1_fq_processing/ECA396/original/ECA396_original.fastq 1_fq_processing/JU2600/original/JU2600_original.fastq 1_fq_processing/ECA36/original/ECA36_original.fastq 1_fq_processing/EG4725/original/EG4725_original.fastq 1_fq_processing/JU310/original/JU310_original.fastq 1_fq_processing/MY2147/original/MY2147_original.fastq 1_fq_processing/N2/original/N2_original.fastq

./arabidopsis/1_fq_processing/1254/original/1254_original.fastq ./arabidopsis/1_fq_processing/6021/original/6021_original.fastq ./arabidopsis/1_fq_processing/6024/original/6024_original.fastq ./arabidopsis/1_fq_processing/9470/original/9470_original.fastq

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published