-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Feel free to get in touch if you find a bug or if you have a suggestion, a question or a request
The history of a species is closely related to the history of its genes. Connecting the evolution of a genome to the evolution of its genes is a way to describe this relationship. In this context, reconciliation of the genes with the species consists into mapping the nodes of a gene tree and the associated events (speciation, duplication, loss, tranfer) to the nodes of the species tree. Reconciliation can as well be used to map the history of a parasite with the history of a host, or to map the history of a protein domain with the history of a sequence.
In the remaining of this document, we will adopt gene/species (for two levels) and genes/symbiont/host (for three levels) as a vocabulary, keeping in mind that all what we present is generic reconciliations
Reconciliation is a complex task and many programs are dedicated to it. Recently the XML format recPhyloXML (Duchemin et al. 2018), inspired from the phyloXML format (Han and Zmasek, 2009), has been proposed as a standard to describe phylogenetic reconciliations.
Visualisation of phylogenetic reconciliations are proposed by various programs and interfaces as NOTUNG (Chen et al., 2000), SylvX (Chevenet et al., 2016), Treerecs (Comte et al., 2020), Jane (Conow et al., 2010), eMPRess (Santichaivekin et al., 2021) and Capybara (Wang et al., 2020). However at the exception of SylvX, all are integrated in a specific reconciliation program and cannot visualise reconciliations produced by others. None of these software is handling RecPhyloXML input files, and none of them is generic to any kind of reconciliation (for example SylvX does not allow temporary free living symbionts, as it is not allowed for genes to live outside a genome) nor can handle multiple horizontal transfer (i.e. several genes transfered with the same donor and recipient) and the consideration of numerous possible scenarios. DoubleRecViz (Kuitche et al., 2021) uses a derived version of recPhyloXML, adding a transcript level to gene and species format but without support for horizontal transfers.
Eventually there is no software able to combine two nested reconciliations i.e. to get in a single representation the gene/symbiont reconciliation and the symbiont/host reconciliation.
Here we present Thirdkind a very simple command-line program allowing the user to easily generate graphical output (svg) from one or several recphyloXML files with a large choice of options (as for example orientation, police size, branch length, multiple gene trees, multiples species trees, multiple files, redundant transfers handling, etc.) and to handle the display of 2 nested reconciliations.
Were are using recphyloXML a format which has been recently proposed to describe reconciliation between a gene (or a symbiont or a domain) and a species (or a host or a sequence). Thirdkind is written in Rust and is thus very easy to install. Thirdkind use a Rust API we developed to handle phylogenetic trees: light_phylogeny. This API may be used to write Rust codes dedicated to read newick, phyloXML and recPhyloXML files, to build, modify and to display phylogenetic trees.
The program Thirdkind is available at the Rust community’s crate registry: https://crates.io/crates/thirdkind
Code sources and input file examples are available here: https://github.com/simonpenel/thirdkind
A web sever dedicated to Thirdkind is available here: http://thirdkind.univ-lyon1.fr/ It focuses on recPhyloXML files and reconciliations.
To install Thirdkind, you need to install cargo. For Linux and MacOS sytems type:
curl https://sh.rustup.rs -sSf | sh
For Windows see: https://doc.rust-lang.org/cargo/getting-started/installation.html
Note: Since Rust does not include its own linker yet, building thirdkind needs to have a C compiler like gcc installed to act as the linker . If it is note the case, install essential build needed by Rust:
sudo apt install build-essential
Once Cargo is installed, just open a new terminal and then type:
cargo install thirdkind
To check that Thirdkind is installed type:
thirdkind
Alternatively it is possible to install Thirdkind from the sources available here with the command cargo build
https://lbbe.univ-lyon1.fr/fr/annuaire-des-membres/penel-simon
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac062/6525213
Usage: thirdkind [OPTIONS] --input-file <INPUT_FILE>
Options:
-a, --output-transfer-analysis
Display transfers analysis (with -m and -t options)
-A, --starting-node <STARTING_NODE>
Display transfers starting from this node only
-b, --browser
Open svg in browser
-B, --display-br-length
With option -l, display branch length
-c, --conf-file <CONF_FILE>
Use configuration file
-C, --gene-colors <GENE_COLORS>
Define colors for gene trees. For example: "red,violet,#4A38C4,orange
-d, --gene-fontsize <GENE_FONTSIZE>
Set font size for gene trees
-D, --species-fontsize <SPECIES_FONTSIZE>
Set font size for species trees
-e, --free-living-sup
"free living" option : nodes associated to FREE_LIVING are drawned in
an external tree and superposed in case of multiple genes
-E, --free-living-shi
"free living" option : nodes associated to FREE_LIVING are drawned in
an external tree and shifted in case of multiple genes
-f, --input-file <INPUT_FILE>
Input tree file (accepted format: newick, phyloXML, recPhyloXML)
-F, --format <FORMAT>
Force format phyloXML/recPhyloXML
-g, --nested <NESTED>
1st level input file (for example a gene-symbiote file with -f
defining a 2nd level symbiote-host file)
-G, --gene-phylo <GENE_PHYLO>
Display the gene number <GENE_PHYLO> in phyloxml style (no species
tree)
-H, --height <HEIGHT>
Height: multiply the tree height by factor <HEIGHT>
-i, --internal-gene-node
Display internal gene node names
-I, --internal-species-node
Display internal species node names
-j, --merge <MERGE>
List of nodes to merge For example:
"PBIAU:PTETRD4,species_5:species_6"
-J, --display-transfers-abundance
With option -t, display the abundance of redudant transfers
-k, --symbol-size <SYMBOL_SIZE>
Size of the circles, crosses, squares, etc
-K, --bezier <BEZIER>
Bezier parameter: curvature of the transfers and branches leading to
free living organisms
-l, --branch-length <BRANCH_LENGTH>
Use branch length, multiplied by the given factor
-L, --landscape
Display as landscape
-m, --multiple
The input file (-f) is a list of recphyloxml files
-M, --midway
Display duplication node at midway in the branch
-n, --gene-tree-list <GENE_TREE_LIST>
List of the indexes of the gene trees to be displayed. For example:
1,2,6,9. If 0, only the species ('upper') tree is displayed
-N, --ending-node <ENDING_NODE>
Display transfers ending to this node only
-o, --output <OUTPUT>
Set the name of the output file or the prefix of the output files
-O, --optimise
Switching nodes in order to minimise transfer crossings (under
development)
-p, --uniform
Species tree uniformisation. All the branches of species have the same
width
-P, --fill-species
Fill the species tree
-q, --node-colors <NODE_COLORS>
Nodes to be coloured : the descendants of each nodes will be drawn
with a different colour. For example: "m3,m25,m36" (Nodes should be
sorted from the top of the tree down to the leaves)
-Q, --background <BACKGROUND>
Background colour
-r, --ratio <RATIO>
Set the ratio between width of species and gene tree. Default is 1.0,
you usualy do not need to change it
-s, --species-only
Display species tree only in phyloxml style
-S, --node-support
Display node support
-t, --threshold <THRESHOLD>
Redudant transfers are displayed as one, with opacity according to
abundance and only if abundance is higher than <THRESHOLD>. Only one
gene is displayed
-T, --threshold-select <THRESHOLD_SELECT>
With option -t, select the index of the gene to display. If set to 0,
no gene is displayed
-u, --threshold-nested <THRESHOLD_NESTED>
With -g, same as -t, but apply to the '-f' input file, and -t will
apply to the '-g' file
-U, --threshold-nested-select <THRESHOLD_NESTED_SELECT>
Same as -T with -t, but for -u
-v, --verbose
Verbose mode
-w, --switch <SWITCH>
List of nodes whose left and right children will be switched For
example: "species_13", "species_14"
-W, --width <WIDTH>
Width: multiply the tree height by factor <WIDTH>
-x, --tidy
Tidy mode (non-layered tidy tree layout)
-X, --tidy-clean
Tidy mode, avoiding leave names superposition
-z, --gene-thickness <GENE_THICKNESS>
Thickness of the gene tree
-Z, --species-thickness <SPECIES_THICKNESS>
Thickness of the species tree
-h, --help
Print help
-V, --version
Print version
Note on -b option : you must set a browser as default application for opening
svg file
Note on -g option : this will generate 3-levels reconciliation svg files.
For example you may input a gene-symbiote recphyloxml file with -g and
symbiote-host recphyloxml file with -f
The -t/-u options are not totally implemented for the 3-levels reconciliation
svg output files.
Note on -x/-X options : the non-layered tidy tree layout is described in :
'van der Ploeg, A. 2014. Drawing non-layered tidy trees in linear time.
Software: Practice and Experience, 44(12): 1467–1484.'
Input format is guessed according to the file name extension:
.phyloxml => phyloXML
.xml => recPhyloxml
.recphyloxml => recPhyloXML
.recPhyloXML => recPhyloXML
.recphylo => recPhyloXML
any other => newick
All examples are in the git repository https://github.com/simonpenel/thirdkind
thirdkind -f recphylo_examples/FAM000297_reconciliated.recphylo -b
thirdkind -f recphylo_examples/concat.xml -b -t 0
thirdkind -f recphylo_examples/hote_parasite_page4_BL.recphylo -b -l 1
thirdkind -f recphylo_examples/free_living_reconciliated.recphylo -b -e -L
thirdkind -f recphylo_examples/testfiles -m -b -t 3 -J
thirdkind -f paramecium_data/liste.txt -m -b -t 25 -J
thirdkind -f recphylo_examples/test2/hote_parasite_page2.recphylo -g
recphylo_examples/test2/gene_parasite_page2.recphylo -b
thirdkind -f recphylo_examples/test1_mult_parasite/rechp_dtl.recphyloxml -g
recphylo_examples/test1_mult_parasite/recgs_mult_host_dtl.recphyloxml -b
thirdkind -f recphylo_examples/recgs_dtl.recphyloxml -b -n 3 -C
"#17387A,#8E3B8B,green" -k12 -q "5,6" -Q "black" -Z 1
thirdkind -f newick_examples/virus.nhx -l 4 -b
thirdkind -f newick_examples/virus.nhx -l 4 -x -b
thirdkind -f newick_examples/virus.nhx -l 4 -X -b
It is possible to specify police and symbol sizes with the -d, -D and -k optional arguments. Thickness of trees can be defined with the -z and -Z optional arguments. Curvature of Bezier curves can be defined with the -K optional argument.
Default colours, opacities and Bezier parameters can be defined by default in a configuration file (default is config_default.txt). Default police sizes are defined in the configuration file too.
Although Thirdkind is mainly dedicated to recPhyloXML files, it can handle several types of file and format:
- One newick file.
- One phyloXML file.
- One recPhyloXML file
- One file describing a set of recPhyloXML files
- Two “nested” recPhyloXML files
Output are svg files which can be visualised with any web browser. You may need to define the default program associated to these files, which can usually be done with a right clik on the file.
All the examples given in the following chapters are available here : wiki examples, recphyloxml examples, paramecium generax examples, phyloxml examples and newick examples or in the thirdkind directory if you have cloned/downloaded the repository (https://github.com/simonpenel/thirdkind).
Warning: if you are using your browser to download examples, note that XML files should be saved as raw text and not html!
Newick is a parenthesed format used for phylogenetic trees. Thirdkind will handle a tree stored in newick format only if the tree is rooted, and NHX tags will not be considered. When using a Newick file, the svg output file will display a single tree. PhyloXML is a xml format dedicated to phylogenetic trees allowing to describe evolution events. The svg output file will display a single tree, with a symbol at each node for each evolution event (a circle for a speciation, a square for a duplication, a cross for a loss, a diamond for transfer) and the branch between the 2 nodes involved a transfer will be spotted lines. This style will be called “phyloxml svg style” throughout this document.
phyloxml svg style:
thirdkind -f xml_examples/FAM036542_gene.xml -b -F phyloxml -k 15
RecPhyloXML is a xml format inspired from phyloXML dedicated to reconciled phylogenetic trees. A recPhlyloXML file contains at least one species tree and one reconciled gene tree mapped to (one of) the species tree(s). In recPhyloXML, a clade (i.e. a node or leaf in the tree) presents several tags, among which a name, a location, a type of event, etc. Each node of the gene tree(s) should present a “location” tag, the value of which should be the same than the value of the “name” tag of one of the clades in the species tree(s). It is possible to have multiple gene trees, and multiples species tree in a single file. The svg consists of one ore several reconciled gene trees mapped inside one or several species tree. In this paper the trees which are mapped (here the gene trees) will be called 'lower' trees and the trees on which the 'lower' trees are mapped (here the species trees) will be called 'upper' trees. The svg output file will this display the specie tree(s) as 'upper trees' containing the 'lower' gene trees with symbols at their nodes. Duplication nodes are represented as squares, speciation nodes as circle and losses as crosses. Leaves are red squares. The transfers are bezier spotted lines ending with an arrow. If there is more than 1 gene trees, the 'lower' trees will have different colours. This style will be called “recphyloxml svg style” throughout this document.
recphyloxml svg style:
thirdkind -f recphylo_examples/FAM001051_FAM000799_reconciliated.xml -k 8 -b
You can choose colours for gene trees and/or select the gene trees to be displayed: for example select genes 2 and 3 with colours #17387A and #8E3B8B
thirdkind -f recphylo_examples/recgs_dtl.recphyloxml -b -n 2,3 -C "#17387A,#8E3B8B" -k8
You can choose colours for parts of gene trees and/or select the gene trees to be displayed and/or choosing colours: for example select gene 3 with colours #17387A, #8E3B8B and green to highlight duplication leading to nodes named "5" and "6":
thirdkind -f recphylo_examples/recgs_dtl.recphyloxml -b -n 3 -C "#17387A,#8E3B8B,green" -k8 -q "5,6"
You can choose the background colour and the thickness of the species tree
thirdkind -f recphylo_examples/recgs_dtl.recphyloxml -b -n 3 -C "#17387A,#8E3B8B,green" -k12 -q "5,6" -Q "black" -Z 1
You can choose to fill the species tree
thirdkind -f recphylo_examples/recgs_dtl.recphyloxml -b -n 3 -C "#17387A,#8E3B8B,green" -k12 -q "5,6" -P
thirdkind -f example_wiki/recgs_dtl.recphyloxml -b -L
You may need to swith species nodes, for example to avoid transfers to cross the whole tree:
thirdkind -f example_wiki/recgs_dtl.recphyloxml -I -L -w p0_9,p0_10,p0_2,p0_41,p0_44 -b
thirdkind -f recphylo_examples/ex7comp.recphyloxml -b
Use real length branch with a factor given by -l
thirdkind -f xml_examples/apaf.xml -b -F phyloxml -l 5 -W 0.5
Branch length in the svg are the real branch lengths of the phyloxml tree multplied by 5:
thirdkind -f recphylo_examples/hote_parasite_page4_BL.recphylo -b -l 1
Branch lengths of the pipe tree in the svg are the real branch lengths of the 'upper' tree in recphyloxml multiplied by 1
The "tidy" mode allows to use a non-layered tidy tree layout, as described here https://onlinelibrary.wiley.com/doi/10.1002/spe.2213 (van der Ploeg, 2014) and here 10.1093/molbev/msac204 (Penel and de Vienne, 2022).
The [-x/-X] option will compress the tree following a non-layered tidy tree layout, and divide by 2 the space between nodes in order to increase the compression:
thirdkind -f newick_examples/virus.nhx -l 4 -b -X
You can visualise the specific effect of non-layered tidy tree compression by comparing with the tree obtained without the [-x/-X] option and a scaling of 0.5
thirdkind -f newick_examples/virus.nhx -l 4 -b -W 0.5
The -x option compress the tree whatever the length of leave names, the -X option takes the length of the leave names in order to avoid the superposition of the names.
In a "recphyloxml svg style" context, only the -x option is available.
By convention, duplication nodes of lower tree of are located into the associated node into upper tree. The option -M allows to locate the duplicated node in the middle of the branch leading to the associated node.
thirdkind -f recphylo_examples/example_dupli.recphylo -b -H 3
thirdkind -f recphylo_examples/example_dupli.recphylo -b -H 3 -M
It is possible to use a list of recPhyloXML files instead a single recPhyloXML file. This will give the same results as a single file with species trees and gene trees of the first file and all the gene trees of the other files. This option is useful to handle large sets of reconciliations, in combination with -t option. Thirdkind is able to handle more than 10,000 gene families.
In case of multiples gene histories, it may be interesting to focus on the gene transfers, especially on redundant transfers. Typically it is useful to enlighten frequent transfers. Option -t will draw only 1 gene history and will draw in red all the transfers according to their abundance, i.e. the number of times the transfer is present in the gene histories: only the transfers with a abundance higher that the threshold given by option -t will be drawn, and the opacity of the transfer reflects its abundance. The option -T allows to choose the gene to display. The option -J will display the abundance of the transfer. This may be useful to deal with Generax output for example.
thirdkind -f paramecium_data/liste.txt -t 1 -m -b
Transfer redundancy in 1000 gene histories:
thirdkind -f paramecium_data/liste.txt -t 25 -m -b -J
Transfer redundancy in 1000 gene histories, display only transfers with an abundance higher than 25:
When reconciliating a symbiont with its host, it may happened that a part of the symbiont tree is not mapped with the host tree. For example if in the history of an organism, some taxon may be free living species and some taxon ma have evolved to be a symbiont of a host. In this case, free living organism should have a “Location” tag indicating “FREE LIVING” instead the name of a host. Thirdkind will draw the free living part of the symbiont 'lower' tree outside the host 'upper' tree.
thirdkind -f example_wiki/free_living_reconciliated.recphylo -e -b
Free living organisms:
When there are several symbiont histories the -e option will superpose the free-living parts and the option -E will separate them.
thirdkind -f example_wiki/free_living_reconciliated.recphylo -e -K 4 -b
thirdkind -f example_wiki/free_living_reconciliated.recphylo -e -K 8 -b
It is possible to combine 2 reconciliations as for example a gene/species reconciliation and a symbiont/host reconcilitaion, in which the symbiont of the second reconciliation is the species of the first one. This is done with the option -g which indicate the gene/species file, -f indicating the symbiont/host file. The software will generate several svg files: one “recphyloxml svg style” for each of the two input, a “phyloml svg style” of the reconciled symbiont tree (from -f file), a “phyloml svg style” of each reconciled gene trees (from -g file), a simple tree of the host, and 3 “mapped” svg files describing the gene/symbiont/host reconciliation.
The first “mapped” svg file is a modified version of the recphyloxml style svg of the gene/symbiont reconiciliation: the 'upper' tree of the symbiont presents features describing its reconciliation with the host: a big square for a duplication node, an additional branch coloured in black for a loss and the the segments between the start and end of a transfer are coloured in green.
The second “mapped” svg file is a modified version of the recphyloxml style svg of the symbiont/host reconciliation in which gene transfers are mapped to the host nodes and displayed in red: For example if there is a gene transfer between the symbiont “C” present in host “3” and the symbiont “E” present in host “4” you will get a red bezier path between node “3” and “4” in the 'upper' host tree.
The third “mapped” is a mapping of the gene trees over the host tree through the symbiont: For example if genes “B1” and “B2” are associated to the symbiont “B”, and the symbiont B is associated to host “4”, the genes B1, B2 are associated to host “4” in the svg. If a gene is transferred between hosts via a symbiont transfer, the transfer start with a yellow diamond and the stippling is different. A gene transfer across symbionts which is not affected by a transfer of the symbiont across hosts is displayed as a classic gene transfer.
Host is violet, symbiont is pink and genes are blue.
thirdkind -f example_wiki/publi/parasite_hote.recphylo -g example_wiki/publi/gene_parasite.recphylo -e -b
Real example:
thirdkind -f example_wiki/thirdlevel/hote_parasite_page2.recphylo -g example_wiki/thirdlevel/gene_parasite_page2.recphylo -b
(Origin of data : https://doi.org/10.1038/s41396-019-0533-6)
Another real example: multiple symbionts
thirdkind -f example_wiki/multi_symbiotes/rechp_dtl.recphyloxml -g example_wiki/multi_symbiotes/recgs_mult_host_dtl.recphyloxml -b
It is possible to describe hybridation in the recphyloxml file.
An example of hybridation and how to describe it can be found here : https://raw.githubusercontent.com/simonpenel/thirdkind/refs/heads/master/recphylo_examples/hybrid6.recphylo
thirdkind -f recphylo_examples/hybrid6.recphylo -b
thirdkind -f recphylo_examples/hybrid.recphylo -b
The software has many options described in the help message. Among them, in addition of the previously described options, the following are the most useful:
- -c configfile: use a configuration file
- -d fontsize: set font size for gene trees
- -D fontsize: set font size for species trees
- -k size: set the size of event symbols (crosses, circles, squares, etc.)
- -G : (recphyloXML format only) draw only the gene #n in phyloxml svg style
- -i : display internal gene nodes
- -I : display internal species nodes
- -p : build a phylogram
- -P : (recphyloXML format only) 'upper' species tree uniformisation
- -s : (recphyloXML format only) drawing only the species tree
It is possible to configure some default features with a configuration file.
thirdkind -f recphylo_examples/FAM000600_reconciliated_big.recphylo -c my_config.txt -b
Contents of the default configuration file: config_default.txt
The XML format “recPhyloXML” has been proposed as a standard to describe phylogenetic reconciliations and is now produced directly or via translation scripts by a majority of reconciliation software. Translation scripts are available here: https://github.com/WandrilleD/recPhyloXML
Thirdkind was able to process 5,000 reconciled trees of 50 nodes in 2 seconds and to process a tree of 7,000 nodes in 1 second.