-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom DB and custom taxonomy (GTDB or similar) #884
Comments
Hello, We are in the process of adding GTDB support to the |
I'm also interested in building a custom database with GTDB taxonomy! When can we expect to have this function available? |
I pushed a change to the k2 wrapper today. Here's how you can build a GTDB database with the changes: k2 build --special gtdb --gtdb-files gtdb_genomes_reps.tar.gz --threads 6 --db gtdb_reps The You can find the list of file names here https://data.gtdb.ecogenomic.org/releases/latest/genomic_files_reps/ The command line will also offer a list of files if you specify the wrong file name: k2 build --special --gtdb-files foo.tar.gz --db gtdb_reps
[ERROR - 2024-11-05 17:57:48,567]: Could not find any files matching foo.tar.gz
[ERROR - 2024-11-05 17:57:48,567]: Here are a list of candidates:
bac120_msa_marker_genes_all.tar.gz
ar53_msa_marker_genes_all.tar.gz
ssu_all.fna.gz
bac120_marker_genes_all.tar.gz
ar53_marker_genes_all.tar.gz
bac120_ssu_reps.fna.gz
ar53_msa_marker_genes_reps.tar.gz
gtdb_proteins_aa_reps.tar.gz
bac120_msa_marker_genes_reps.tar.gz
bac120_msa_reps.faa.gz
bac120_marker_genes_reps.tar.gz
ar53_msa_reps.faa.gz
ar53_marker_genes_reps.tar.gz
gtdb_proteins_nt_reps.tar.gz
gtdb_genomes_reps.tar.gz
ar53_ssu_reps.fna.gz We welcome your testing in making sure that this feature works as expected. |
Traceback (most recent call last):
Hello, I tried to build a GTDB database with the k2 you provided, but it resulted in this error. |
I pushed a fix for this issue a few days ago and have also pushed a fix for potential crashes while masking the genomes. Can you try pulling these changes and trying again? |
hi, is there any update to this issue? thanks :) |
I was able to run the command with no errors after the push. |
We will soon be publishing an index for the GTDB representative genomes, stay tuned. |
That would be really great, thank you. |
The database is now available, see: https://benlangmead.github.io/aws-indexes/k2 |
thank you |
Is there a way to create a completely new database and non-NCBI taxonomy for that database? I am primarily interested in using something like GTDB, which has a genome download and taxonomy tsv file available.
I understand that I can add the contigs/genomes to a custom database using "kraken2-build --add-to-library" but the creation of a new taxonomy doesn't seem straight forward.
The text was updated successfully, but these errors were encountered: