-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sintax taxonomy classifier #210
Comments
Thanks for the suggestion. We might add some kind of taxonomic classifier to VSEARCH in the future, but there are no firm plans at the moment. |
Agree that this would be very useful! We have recently developed a SINTAX formatted version of the SilvaMod database based on Silva and part of CREST (https://github.com/lanzen/CREST/tree/master/LCAClassifier). Unfortunately, it cannot be used without a 64-bit license of usearch since it is too large. |
I'll second this! Think of the opportunities now that Nanopore sequencing is booming. |
@torognes any updates in that regard? would be great to have a hierarchic classification procedure likewise to utax/sintax. |
I still agree that this would be very useful to include and one of the top features to prioritise, but I do not know when I can find time to implement it. |
We'll be waiting or hopefully someone can contribute useful code meanwhile! :) Thanks @torognes |
Due to popular demand, I have implemented the The It implements the Sintax algorithm as described in Robert Edgar's preprint: Robert Edgar (2016) Further details: https://www.drive5.com/usearch/manual/cmd_sintax.html Multithreading is supported. Databases in UDB files are supported. Strand option may be specified. This is a new feature that has been only very briefly tested. Feedback is therefore highly welcomed! |
Great news, thanks! I will test it soon... |
There are some issues with the (original) sintax command that are prohibiting its use for me (and potentially others): -the self testing is very unflexible (can only self test the whole database at once against the whole database using LOOCV); instead of LOOCV with selected sequences only If you are re-developing the sintax algorithm maybe some of these issues could be resolved very easily. |
Thank you very much, absolutely appreciated! |
Tusen takk Torbjørn,
Detta kommer helt klart å være en viktig resurs for meg, spesielt siden
gratisversionen av SINTAX ikke en gang klarer en database like stor som
nyeste NR-versionen av SILVA.
Vennlig hilsen,
Anders
…On Fri, Mar 2, 2018 at 2:17 PM, Alexander Keller ***@***.***> wrote:
Thank you very much, absolutely appreciated!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#210 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHCkUQRa9sUlGD7P-zh2H-LdFUutngVeks5taUZ1gaJpZM4Kg2_T>
.
|
Thanks for your feedback. So far I have just tried to implement the SINTAX algorithm as described in the preprint. I understand that there is some disagreement about the quality of the algorithm and some issues have been raised. I will look more into these and see if it is possible to improve it or to implement a different algorithm. Please tell me if you have any specific ideas for improvement. |
Thanks for your reply. Something like: 1) label_query 2) label_hit 3) length_query 4 length_hit 5) percent_similarity_kmers_query_and_hit 6) bootstrap value 7) number of kmers that are not identical ...? |
Hey there. A previously SV was assigned as Bacillus anthracis by vsearch with –maxaccepts 1 (disclaimer: at that point I run vsearch implemented in Qiime2 workflow). I then rerun it against the same db with vsearch outside qiime: Turns out what was B. anthracis looks like more for B. cereus. However, one should consider that a few species actually are so so similar to B. cereus that we have B. cereus group. So, summing all the % of B. cereus group members’ I got 53% B. cereus group. Then, we have the number of taxonomies for each given species in the databank. If there is an enrichment of a taxonomy (exactly like he mentioned) the output tends to deviate to that assignment when ranking. And that is partly happening to this example, because I have 431 B. cereus x 73 B. anthracis seqs in the db. I told partly because 36.7% of 881 = 323 and 5.9% = 52, but 323 is 75% of 431 while 52 is 71% of 73. I actually see a tendency on that, where the %of hits from B. cereus, B. thuringiensis, B. anthracis and B. micoydes are 75-73-71-68%. Well, at the end I would consider this sequence as “B. cereus group” and not B. thuringiensis. |
I have made several improvements to the sintax command in vsearch 2.28.1, just released. Please see issue #535 or the release notes for details. |
Dear @torognes, are you planning to add the sintax classifier http://biorxiv.org/content/early/2016/09/09/074161?
Thank you,
Davide
The text was updated successfully, but these errors were encountered: