Releases: pmelsted/pizzly
Bug fixes and license switch
We recommend that users switch to this release because of a few bug fixes.
- Wrong strand reported when fusions are only supported by pairs and no split reads
- Bug with negative coordinates fixed
- Bug fixed when reference sequences had lower case characters
- Correct order of
geneA
andgeneB
in JSON output
Additionally we now have a scripts folder with useful python scripts
get_fragment_length.py
examines anabundance.h5
produced bykallisto
and finds the 95th percentile of the fragment length distributionflatten_json.py
reads the.json
output and converts to a simple gene table
License was switched from GPL to BSD-2
Better GTF support
GTF
This version includes bug fixes that improve GTF parsing. We now support the Ensembl and Gencode annotations and have been tested with the latest versions.
Note that for Gencode the FASTA files must be modified so that they match the GTF files (Gencode fasta uses pipes, |
, as a separator in the FASTA sequence names, rather than a space). This can be fixed by running
zcat gencode.v26.transcripts.fa.gz | tr '|' ' ' | gzip -1 > gencode.v26.transcripts.fixed.fa.gz
Protein coding annotation
pizzly limits the fusion reports to transcripts that have been annotated as protein coding. If this information is not present in the annotation, the --ignore-protein
option ignores this requirement. Running pizzly in this way will most likely increase the number of false positives reported.
Warnings
pizzly will now warn when there are sequences in the FASTA file with no corresponding annotation and exit if no sequences have available annotation. pizzly also warns if no transcripts are annotated as protein coding.
Better filtering
Filtering
Pizzly now outputs filtered and unfiltered fusion calls.
Pizzly filters on
- number of supporting reads
- distance of fusion breakpoint to exon boundaries
- for fusions with unknown breakpoints, only read pairs that observe the maximum fragment length are included
Example
An example pipeline based on data from Tembe et al., Open-access synthetic spike-in mRNA-seq data for cancer gene fusions.
The example pipeline is implemented in snakemake.
First release
First release of pizzly.