-
Notifications
You must be signed in to change notification settings - Fork 4
Rendering Displays
After performing a gig-map
alignment, it can be very helpful to render a visual
display of the results. One of the nice features of the gig-map
utility is the
ability to generate an HTML file which uses the Plotly library
to make an interactive map with features for zooming in and expanding specific
regions of genetic space. In a previous step, the user should have used the alignment
utility of gig-map
to generate a set of detailed outputs. In this step, those
alignment outputs will be transformed into an HTML file which can be opened and
explored by the user.
To start, create or identify a folder which will be used to run this analysis. Next, download these two template files to help you set up the alignment process:
The render.params.json
file allows you to specify which set of alignment results will be used,
and in what location the output files will be placed. The render.sh
file is a script
which will launch the appropriate utility within gig-map
using the parameters specified
in render.params.json
To list the complete set of options available for the rendering utility, run the following command:
bash render.sh --help
By default, these files are set up to read the alignments saved to the raw binary
file alignments/alignments.rdb
and save an output file named gig-map.html
in
the output/
folder. Please modify any of the values in the render.params.json
file as appropriate for your use-case.
Once you are satisfied that the render.params.json
file is pointing to the
right set of inputs and outputs, start the download process by running:
bash render.sh
There are many parameters which can be modified when rendering a display. Some of
the most useful additions to a gig-map
display is a human-readable annotation
of the genes and genomes that were used. To make it easy for the user to produce
a customized display, the gig-map
utility is set up to read in a CSV file which
contains the exact text which the user would like to use for each gene and/or genome.
Annotation tables for genes and genomes can be added to render.params.json
with
the following params:
{
"rdb": "alignments/alignments.rdb",
"output_prefix": "gig-map",
"output_folder": "output",
"gene_annotations": "gene_annotations.csv",
"genome_annotations": "genome_annotations.csv"
}
The gene annotation CSV must contain a column labeled gene_id
which matches
the name of the gene in the input, while the genome annotation CSV must contain
a column named genome_id
. Note that both of the gig-map
utilities for
downloading genes and genomes from NCBI will automatically create a suitable CSV
with this format.
When rendering the gig-map
display, there is a longer list of options
which can be used to control many different aspects of its formatting. For
reasons that are too tedious to mention, these options can be specified in
the render.params.json
file in a slightly more complex way. All of these
additional display options can be provided inside a single options
field
with the following syntax:
{
"rdb": "alignments/alignments.rdb",
"output_prefix": "gig-map",
"output_folder": "output",
"gene_annotations": "gene_annotations.csv",
"genome_annotations": "genome_annotations.csv",
"options": "--min-pctid 95 --min-cov 95 --tree-width 0.2 --label-genes-by 'Combined Name'"
}
Note that in the example above, the name used to label each gene will be
read from the column in gene_annotations.csv
with the header Combined Name
.
The complete list of options which can be used in the options
field are:
-
--min-pctid
: Minimum amino acid similarity threshold for displayed alignments (default: 90) -
--min-cov
: Minimum alignment coverage threshold for displayed alignments (default: 90) -
--color-genes-by
: Indicate a column from the gene annotation table to use for coloring genes -
--label-genomes-by
: Indicate a column from the genome annotation table used for labeling -
--figure-height
: Specify an integer number of pixels to set the total figure height -
--figure-width
: Figure width in pixels (default: 1200) -
--max-genome-label-len
: Maximum number of characters allowed for genome labels (default: 60) -
--max-gene-label-len
: Maximum number of characters allowed for gene labels (default: 60) -
--label-genes-by
: Indicate a column from the gene annotation table used for labeling -
--clustering-method
: Method used to cluster genomes, either "ani" or the name of a specific marker (default: ani) -
--min-genes-per-genome
: Do not display any genome found in fewer than this number of genes -
--min-genomes-per-gene
: Do not display any gene found in fewer than this number of genomes -
--max-n-genomes
: Set a maximum number of genomes to display, removing genomes with the fewest aligned genes -
--query
: Filter the genes for display based on a string containing boolean logic to be applied to gene annotations. For example, if the gene annotation file contains a column of numeric values with a header oflength
, then the query string "length >= 100" would limit the set of genes which are ultimately displayed to only those genes for which the value in thelength
column is >= 100. -
--colorscale
: Plotly colorscale used for heatmap -
--tree-width
: Proportional size of tree used for plotting (default: 0.4) -
--skip-gene-resort
: If specified, use the pre-computed gene order. This will prevent the time-consuming recalculation of linkage clustering -
--show_hovertext
: If specified, include hovertext in the HTML display (which increases the file size considerably, and is rather buggy)