JASMINE: Jointly Accurate Sv Merging with Intersample Network Edges
Version 1.1.5
This tool is used to merge structural variants (SVs) across samples. Each sample has a number of SV calls, consisting of position information (chromosome, start, end, length), type and strand information, and a number of other values. Jasmine represents the set of all SVs across samples as a network, and uses a modified minimum spanning forest algorithm to determine the best way of merging the variants such that each merged variants represents a set of analogous variants occurring in different samples.
The recommended installation method is through bioconda.
Conda Installation command (typically takes under a minute to install):
conda config --add channels bioconda
conda config --add channels conda-forge
conda install jasminesv
When running Jasmine, one of the preprocessing options is to run Iris, a tool which refines the sequences and breakpoints of insertions in datasets with high-error reads. Iris depends on samtools, minimap2, and racon by default, which can be installed separately and either added to your path or pointed to with the iris_args
parameter. Once these dependencies are installed (or if running Jasmine without Iris preprocessing), Jasmine can be built with the following command:
path_to_jasmine_repo/build_jar.sh
After building the jar file, Jasmine can be run with the executable file jasmine
, which will be in the main folder of this repository if building from source, or in the condabin folder if installed through conda. Running it with no parameters will print a usage menu describing the required and optional arguments.
To run Jasmine on HiFi data from the HG002 trio, run the following commands (typically takes about a minute to download and under five minutes to run on a modern desktop):
wget http://data.schatz-lab.org/jasmine/HG002Trio/UnmergedVCFs/HG002vGRCh38_wm_50md_PBCCS_sniffles.s2l20.refined.nSVtypes.ism.vcf.gz
wget http://data.schatz-lab.org/jasmine/HG002Trio/UnmergedVCFs/HG003vGRCh38_wm_50md_PBCCS_sniffles.s2l20.refined.nSVtypes.ism.vcf.gz
wget http://data.schatz-lab.org/jasmine/HG002Trio/UnmergedVCFs/HG004vGRCh38_wm_50md_PBCCS_sniffles.s2l20.refined.nSVtypes.ism.vcf.gz
wget http://data.schatz-lab.org/jasmine/HG002Trio/HG002Trio_HiFi.merged.vcf.gz
gunzip *
ls *vGRCh38_wm_50md_PBCCS_sniffles.s2l20.refined.nSVtypes.ism.vcf > filelist.txt
jasmine file_list=filelist.txt out_file=merged.vcf
jasmine --dup_to_ins --postprocess_only out_file=merged.vcf
The output of merged.vcf should then exactly match the contents of HG002Trio_HiFi.merged.vcf.
Jasmine is offered as standalone software and will accurately merge SV calls from any SV callers, including short-read callers. However, if calling SVs from genomic long reads (PacBio CLR, PacBio HiFi, or Oxford Nanopore), for best results, we recommend using the following optimized pipeline to obtain population-scale SV calls from FASTQ files. This pipeline is provided as a Snakemake pipeline.
Jasmine also includes a module for automating the creation of IGV screenshots of variants of interest. It can be run through the igv_jasmine
executable file. Running it with no parameters will print a usage menu describing the required and optional arguments, and it requires at minimum the following:
- BAM files from which variants were called in each sample
- The reference genome
- The merged VCF file, or a BED file with regions of interest
Running this module creates a folder which will store IGV screenshots for each variant (optionally filtered based on the command line parameters), and populates that folder with a .bat file, a script which can be run through IGV by selecting Tools -> Run Batch Script and navigating to the file. After running this script, the folder containing the .bat file will also include images of the regions surrounding each variant of interest.
The user manual with detailed information about input/output files and command line arguments can be found here: https://github.com/mkirsche/Jasmine/wiki/Jasmine-User-Manual