-
Notifications
You must be signed in to change notification settings - Fork 4
Aligning Genes to Genomes
Sam Minot edited this page May 25, 2022
·
7 revisions
Now we get to the core functionality of gig-map
, aligning a collection of
genes against a collection of genomes. In a previous step the user should have
generated a set of deduplicated genes
and a collection of microbial genomes.
In the next step, the user will align the genes against the genomes and create
a set of output files which can be used to render gig-map
displays.
To align a collection of genes and genomes, the input genomes must be present within one or more gzip-compressed FASTA files in a single folder. If you want to combine files which are located in different folders, simply create symlinks for those files into a single folder.
-
genes
: Single file containing all genes to be aligned, in amino acid FASTA format (gzip-compressed) (e.g.centroids.faa.gz
from the deduplicate outputs) -
genomes
: Folder containing all genomes to align against (gzip-compressed FASTA format) -
collect_results
: After aligning genomes, perform all additional analyses needed for visualization (true/false) -
min_coverage
: Minimum proportion of a gene which must align in order to retain the alignment [default: 90] -
min_identity
: Minimum percent identity of the amino acid alignment required to retain the alignment [default: 90] -
max_evalue
: Maximum E-value threshold used to filter all alignments [default: 0.001] -
aligner
: Algorithm used for alignment (default: diamond, options: diamond, blast) -
max_overlap
: Any alignment which overlaps a higher-scoring alignment by more than this will be filtered out [default: 50] -
query_gencode
: Genetic code used for conceptual translation of genome sequences [default: 11]
The output from this step will include:
-
genomes.aln.csv.gz
: A table with all of the alignments which were found -
distances.csv.gz
: A table of genome-genome similarity (ANI) -
genomes.gene_order.txt.gz
: A table with the ordering of genes which resulted from this collection of alignments -
gigmap.*.html
: A quick-and-dirty visualization of the gene-to-genome alignment -
gigmap.rdb
: A complete archive of the aligned information which can be used in the interactivegig-map
display tool -
genome.manifest.csv
: A template genome annotation table which can be used to build out more complex visualizations -
gene.manifest.csv
: A similar template annotation table for the genes used in the analysis
Other useful references may be: