-
Notifications
You must be signed in to change notification settings - Fork 1
Reference sequence databases: Genbank
Downloading the pre-formatted genbank nt BLAST database.
#version 4
ncbi-blast-2.8.1+/bin/update_blastdb.pl nt --passive
#version 5
ncbi-blast-2.8.1+/bin/update_blastdb.pl nt_v5 --blastdb_version 5 --passive
#unpack
for i in *.gz; do tar -xzvf $i; rm $i; done
To add the taxonomy to the blast results the scripts need a reference.
Genbank is big and always growing (https://www.ncbi.nlm.nih.gov/genbank/statistics/). Because of the amount of sequences it takes long to blast against this reference. Most of the time not all the sequences are needed when identifying amplicon data. With the snakefile (utilities/genbank/Snakefile) you can create the sub-selections.
Caution
Please note that sub-selections are based on sequence headers. Sequences that are part of mitochondrial or chloroplast genomes will therefore not be present in these sub-selections.
First create a conda environment
conda env create -f utilities/snakemake37_environment.yml
Go to the utilities folder of genbank
cd galaxy-tool-BLAST/utilities/genbank
Activate the environment
conda activate snakemake37
To create the databases execute the snakefile (This pipeline has an output over 350GB)
snakemake -j 6
When the snakemake pipeline is done there will be an output folder containing folders for each sub-selection. You can move the folders to a destination of choice. To use the blast database in galaxy the path of the database need to be added to the blastn.xml file. See the example below.
<macro name="local_databases">
<param name="database" type="select" multiple="true" label="Database">
<option value="/home/galaxy/Tools/galaxy-tool-BLAST/utilities/silva/output/SILVA/18S.fa" label="18S">18S Genbank</option>
</param>
</macro>
This galaxy blast tool can add taxonomy information to the blast hits. For the genbank references the files merged.dmp and rankedlineage.dmp are needed. The files are being downloaded by the snakefile and you can find them in the output/taxonomy folder. You can move them to a location of choice. The path of that location needs to be added to the blastn.sh file. See example below.
$SCRIPTDIR"/blastn_add_taxonomy.py" -i $outlocation'/files/' -t /home/galaxy/Tools/galaxy-tool-BLAST/utilities/genbank/output/taxonomy/rankedlineage.dmp -m /home/galaxy/Tools/galaxy-tool-BLAST/utilities/genbank/output/taxonomy/merged.dmp -ts "${9}" -taxonomy_db $outlocation"/taxonomy_db2" -bold_db $outlocation"/bold_db"