No ORFs for 38kb phage with 33 Prodigal genes #8

snayfach · 2019-05-17T14:27:59Z

I've now run phanotate on a large number of phage sequences. Its seems to be producing output in most cases. However for a small subset of sequences, no ORFs are found. These sequences are often >20Kb and contain numerous genes called by Prodigal.

Here is one example you can download and test yourself:
https://www.dropbox.com/s/xxpagf1qvcj6f57/test.fna?dl=0

In this example, the sequence in a 38kb phage with 33 Prodigal genes

Any ideas what might be going on here?

deprekate · 2019-05-20T21:07:15Z

Hm, I am trying to track this down, it appears to be something wrong in fastpathz, which returns 'no path to target'. I have tracked it down to potentially the extremely long ORFs that have huge negative weights in scientific notation (ie -1.2E+40).

snayfach · 2019-05-23T16:22:41Z

Any additional progress on this?

deprekate · 2019-05-23T20:30:11Z

Yep, I tracked down the cause of the problem. It is an error with the fastpathz Bellman-Ford algorithm not correctly relaxing the edges. I will have a fix as soon as possible, at least by early next week, if not over the weekend.

deprekate · 2019-05-31T21:32:45Z

I wanted to update you on this bug. I tracked the original bug down and fixed it. It was due to GMP not having a notion of INFINITY, so I create an 'INFINITY' by setting an arbitrarily large number, which it turned out was not large enough. I made it even larger (1E1000) and it fixed the bug.

However to test whether this fix would affect previous runs, I ran the new code on ~4,000 phage genomes. All but one were 100% identical, with the one different run choosing a different start codon. Depending on random chance, the Bellman-Ford will return two different start codons in different runs. This is wrong, since the Bellman-Ford should always return the same path, if the paths are not equal.

I have been trying to track it down, I believe it may either be something to do with the GMP math library, or more likely a collision in the UTHASH dictionary implementation. I will continue to work on it in the coming weeks.

Cheers

deprekate · 2019-06-03T23:26:01Z

I tracked down the bug. It turns out it was my fault, I did not correctly null terminate a character array in fastpathz. I have fixed the bug, and pushed updates out.

snayfach · 2019-06-10T14:24:22Z

I'm still finding cases where PHANNOTATE predicts no genes. Here's one example of a 14.8 Kb contig with 24 Prodigal genes:
https://www.dropbox.com/s/r5792v92kqqhlcu/test3.fna

deprekate · 2019-06-10T20:11:55Z

Hm, I can't seem to recreate the error, I have tried on OSX and Linux (CentOS), with pypy/python2.7/python3, and with/without tRNAscanSE. I also ran it 10,000 times in case it was a random occurence type bug.

What platform are you on?

It might be that the updated fastpathz was not pulled? I know github does not handle included submodules as seamless as they should (the reason for the --recursive flag in the git clone). You can see if manually pulling the fastpast submodule fixes the error:

git submodule update --recursive

snayfach · 2019-06-10T20:23:07Z

You're right, it was my mistake this time. I failed to re-install the software properly

snayfach · 2019-06-20T17:32:22Z

Can you check one more sequence for me:
https://www.dropbox.com/s/a91h60wyqb2yzc8/test.fna

I'm running 1.2.1 but still getting the same error. I'm not sure if this is an error on my end (failed to install latest code) or not.

deprekate · 2019-06-20T20:30:29Z

It looks like this one does error with the "no path to target" from fastpathz. I'll get a track it down and get a fix pushed out.

deprekate · 2019-07-09T02:34:39Z

Hi, sorry for the delay. I have tracked down the problem, and it is because there is a region in the sequence that is "essentially" non-coding and greater than the 300bp connection cutoff. The reason I say "essentially" is that the region is actually spanned by a spurious ORF (in the +3 frame), however this spurious ORF does not get added to the path (in red) because it overlaps by more than 300bp with the next ORF that would be added to the path (the right red ORF).

I could tell right away that the middle red ORF should actually extend to the right, I figured whatever this genome was it used a non-traditional start codon somewhere around 20100. In actuality I realized that this genome has TAG stop codon read through. and that the three red ORFs are most likely one single giant ORF.

I am trying to figure out the best way for PHANOTATE to handle such a situation.

ilyavs · 2022-11-12T22:07:54Z

Hi,
@deprekate any news on this? I encountered a similar error from fastpathz.
Any way to skip the problematic ORF and still get the output for the rest of the genes?
Thanks,
Ilya.

deprekate · 2022-11-15T01:27:52Z

@ilyavs,

I have not worked on this bug, since it is caused by using the wrong translation table (ie stop codon readthrough). The develop branch of phanotate allows for specifying which stop codons (and start codons) to use.

I also just updated the develop branch to allow for up to 500bp overlaps, which might help force through path prediction.

ilyavs · 2022-12-16T19:55:12Z

Hi, thank you for your response. I just ran the develop branch on the same fasta file without changing any input arguments (so no changes in the translation table) and it worked without errors.
Would you recommend using the develop branch for routine analysis? Is it considered stable?
Thanks,
Ilya.

deprekate · 2022-12-16T21:40:04Z

The develop branch has many performance increases since I moved some of the slower python code into C code (because people were using phanotate to annotate bacteria, and it was taking longer). But to do this I had to simplify the weight calculations, which makes the output not 100% identical to the version 1.

I think there was one instability bug that I couldn't quite track down #26, but it was mostly caused by merging multiple phage contigs into one big file.

When I was testing the accuracy of the 2.0 develop branch to the 1.0 version (on 4k phages), it didn't perform significantly better or worse, just those different 0.1% ORF calls of mostly lower scoring "questionable" ORFS, so I never pushed the 2.0 version to the main branch. I would consider it mostly stable, more of a nightly build than a production build.

There are also a few more options, like being able to specify the start and stop codons.

ilyavs · 2022-12-16T21:43:29Z

Thank you. Do you have an estimate, when or if, the develop branch will make it into a release?

deprekate · 2022-12-16T22:28:56Z

Probably not for another 6 months. I have been really busy on two other computational biology software projects: PRFect and Genotate.

PRFect is my new tool that predicts Programmed Ribosomal Frameshifts in the coding genes of a genome, its in review now

Genotate isn't public just yet, is another gene finder, except it runs on all domains of life. It is totally different from Phanotate and has some really amazing features as it doesn't use start or stop codons the way all other gene finders (GeneMark, Glimmer, Prodigal, Phanotate) rely upon. Though, because Genotate is supervised learning, Phanotate will still be useful for identifying novel and unknown genes in diverse genomes.

deprekate closed this as completed Jun 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No ORFs for 38kb phage with 33 Prodigal genes #8

No ORFs for 38kb phage with 33 Prodigal genes #8

snayfach commented May 17, 2019

deprekate commented May 20, 2019

snayfach commented May 23, 2019

deprekate commented May 23, 2019

deprekate commented May 31, 2019

deprekate commented Jun 3, 2019

snayfach commented Jun 10, 2019

deprekate commented Jun 10, 2019

snayfach commented Jun 10, 2019

snayfach commented Jun 20, 2019

deprekate commented Jun 20, 2019

deprekate commented Jul 9, 2019

ilyavs commented Nov 12, 2022 •

edited

Loading

deprekate commented Nov 15, 2022

ilyavs commented Dec 16, 2022

deprekate commented Dec 16, 2022

ilyavs commented Dec 16, 2022

deprekate commented Dec 16, 2022

No ORFs for 38kb phage with 33 Prodigal genes #8

No ORFs for 38kb phage with 33 Prodigal genes #8

Comments

snayfach commented May 17, 2019

deprekate commented May 20, 2019

snayfach commented May 23, 2019

deprekate commented May 23, 2019

deprekate commented May 31, 2019

deprekate commented Jun 3, 2019

snayfach commented Jun 10, 2019

deprekate commented Jun 10, 2019

snayfach commented Jun 10, 2019

snayfach commented Jun 20, 2019

deprekate commented Jun 20, 2019

deprekate commented Jul 9, 2019

ilyavs commented Nov 12, 2022 • edited Loading

deprekate commented Nov 15, 2022

ilyavs commented Dec 16, 2022

deprekate commented Dec 16, 2022

ilyavs commented Dec 16, 2022

deprekate commented Dec 16, 2022

ilyavs commented Nov 12, 2022 •

edited

Loading