Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions and suggestions about the program. #2

Open
SergeyBaikal opened this issue May 31, 2023 · 2 comments
Open

Questions and suggestions about the program. #2

SergeyBaikal opened this issue May 31, 2023 · 2 comments

Comments

@SergeyBaikal
Copy link

  1. I tested my own dataset and the program assigned a genus even for bacterial contigs (RNA model). It would be great if there was an entropy setting to skip false positives. For example less than 0.5.
    python3 predict.py --model_path /home/sergey/VirusTaxo/Dataset/vt_db_rna_virus_kmer_17.pkl --seq /home/VirusTaxo/My_Data/contigs.fasta > /home/VirusTaxo/My_Data/Results.txt
  2. Why not make a complete taxonomic line in the output file?
  3. I only got the correct assignment for one contig (from 15000 seq) more with an entropy setting of -3.15E-12, where there were 0 the taxonomy assignment was not correct.

Dear authors, could you clarify please what I'm doing wrong?

@Rashedul
Copy link
Contributor

Dear Sergey,

Thank you for testing the tool and sharing your valuable feedback! I’d like to address your observations and questions:

The tool employs a k-mer matching strategy, meaning that any random overlap of k-mers between the query sequence and the database could lead to a genus assignment, even if the taxonomy (e.g., RNA viruses) is not as expected. To mitigate this, we’ve introduced a new metric called the "Enrichment Score," which helps reduce the likelihood of random k-mer matches affecting the predictions.

Additionally, this model is specifically designed for predicting viral sequences. Applying it to non-viral sequences may result in incorrect taxonomic assignments. To provide further clarity, we’ve included a new section in the README titled "Method Limitations and Interpretation" to elaborate on these points.

In future updates, we will add full taxonomic lineage (e.g., family, order, genus, species) in the output file, and will provide arguments to choose cutoff for both Entropy and Enrichment_Score.

Please let us know if you have further questions!

@Rashedul
Copy link
Contributor

Rashedul commented Dec 3, 2024

Full taxonomic lineage (e.g., family, order, genus, species) in the output file, and the arguments to choose cutoff for both Entropy and Enrichment have been added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants