-
Notifications
You must be signed in to change notification settings - Fork 44
Polymorphism Aware Models
Polymorphism-aware phylogenetic Models (PoMo) related documentation.
Table of Contents
- Counts files
- First running example
- Substitution models
- Virtual population size
- Sampling method
- Bootstrap branch support
- Interpretation of branch lengths
The binary of IQ-TREE with PoMo can be downloaded or built from source. Please confirm that your version of IQ-TREE supports PoMo.
iqtree
>> IQ-TREE PoMo version 1.5.0 for Linux 64-bit built Jul 6 2016
>> ...
TIP: For a quick overview of all PoMo related options in IQ-TREE, run the command
iqtree -h
and scroll to the headingPOLYMORPHISM AWARE MODELS (PoMo)
.
If you use PoMo, please cite
Dominik Schrempf, Bui Quang Minh, Nicola De Maio, Arndt von Haeseler,
and Carolin Kosiol (2016) Reversible polymorphism-aware phylogenetic
models and their application to tree inference. J. Theor. Biol., in
press.
A preprint is available on bioRxiv: Schrempf et al., 2016.
The input of PoMo is allele frequency data. Especially, when populations have many individuals it is preferable to count the number of bases at each position. This decreases file size and speeds up the parser.
Counts files contain:
-
One headerline that specifies the file as counts file and states the number of populations as well as the number of sites (separated by white space).
-
A second headerline with white space separated headers: CRHOM (chromosome), POS (position) and sequence names.
-
Many lines with counts of A, C, G and T bases and their respective positions.
Comments:
- Lines starting with # before the first headerline are treated as comments.
An example:
COUNTSFILE NPOP 5 NSITES N
CHROM POS Sheep BlackSheep RedSheep Wolf RedWolf
1 1 0,0,1,0 0,0,1,0 0,0,1,0 0,0,5,0 0,0,0,1
1 2 0,0,0,1 0,0,0,1 0,0,0,1 0,0,0,5 0,0,0,1
.
.
.
9 8373 0,0,0,1 1,0,0,0 0,1,0,0 0,1,4,0 0,0,1,0
.
.
.
Y 9999 0,0,0,1 0,1,0,0 0,1,0,0 0,5,0,0 0,0,1,0
The download includes an example alignment called example.cf
in
counts file format (so far, PoMo only supports counts files). You can
now start to reconstruct a maximum-likelihood tree from this alignment
by entering (assuming that you are now in the same folder with
example.cf
):
iqtree -s example.cf
-s
is the option to specify the name of the alignment file. At the
end of the run IQ-TREE writes the same output files as in the standard
version (see tutorial).
-
example.cf.iqtree
: the main report file that is self-readable. You should look at this file to see the computational results. It also contains a textual representation of the final tree. -
example.cf.treefile
: the ML tree in NEWICK format, which can be visualized by any supported tree viewer programs like FigTree or iTOL. -
example.cf.log
: log file of the entire run (also printed on the screen). To report bugs, please send this log file and the original alignment file to the authors.
The default prefix of all output files is the alignment file
name. However, you can always change the prefix using the -pre
option, e.g.:
iqtree -s example.cf -pre myprefix
This prevents output files to be overwritten when you perform multiple analyses on the same alignment within the same folder.
By default, PoMo runs with the HKY model. Different DNA substitution
models can be selected with the -m
option. E.g., to select the GTR
model, run IQ-TREE with:
iqtree -s example.cf -m GTR
If a counts file is given as input file, the PoMo model will be automatically chosen. You can also explicitly specify to run the (reversible) PoMo model with:
iqtree -s example.cf -m GTR+rP
The frequency type can also be selected With -m
. The default is to
empirically estimate allele frequencies. To estimate the allele
frequencies together with the rate parameters, use:
iqtree -s example.cf -m GTR+rP+FO
TIP: For a quick overview of all available models in IQ-TREE, run the command
iqtree -h
and scroll to the headingPOLYMORPHISM AWARE MODELS (PoMo)
.
PoMo models the evolution of populations by means of a virtual
population of constant size N, which defaults to nine (for details,
see Schrempf et al., 2016). The optimal choice of N depends on the
data. If only very few chromosomes have been sequenced per population
(e.g., two to four), N should be lowered to five. If enough data is
available and calculations are not too time consuming, we advise to
increase N up to a maximum of 19. This can be done with the sequence
type option -st
. You can choose odd values from three to 19 as well
as two and ten. E.g., to set N to 19:
iqtree -s example.cf -st CF19
Odd values of N allows the usage of the fast AVX instruction set. This results in a considerable decrease of runtime.
For advanced users. PoMo offers two different methods to read in the data (Schrempf et al., 2016). Briefly, each species and site are treated as follows
-
Weighted (default): assign the likelihood of each PoMo state to its probability of leading to the observed data, assuming it is binomially sampled.
-
Sampled: randomly draw N samples with replacement from the given data and set the PoMo state to the chosen one;
Again, the sequence type option -st
can be used to change the input
method.
-
To use the sampled input method (
R
for random):iqtree -s example.cf -st CR
-
To use the weighted input method (default behavior;
CF
for counts file):iqtree -s example.cf -st CF
To overcome the computational burden required by the non-parametric
bootstrap, IQ-TREE introduces an ultra fast bootstrap approximation
(UFBoot) that is orders of magnitude faster than the standard
procedure and provides relatively unbiased branch support values. To
run UFBoot, use the option -bb
, e.g., for 1000 replicates:
iqtree -s example.cf -bb 1000
The standard non-parametric bootstrap is invoked by the -b
option,
e.g., for 100 replicates:
iqtree -s example.cf -b 100
For a detailed description, please refer to the bootstrap tutorial.
PoMo estimates the branch length in number of mutations and frequency
shifts (drift) per site. The number of drift events compared to the
number of mutations becomes higher if the
virtual population size is increased. To
get the branch length measured in number of substitutions per site which
enables a comparison to the branch length estimated by standard DNA
substitution models, it has to be divided by N^2. PoMo also outputs
the total tree length measured in number of substitution per site in
example.cf.iqtree
. An example of the relevant section:
NOTE: The branch lengths of PoMo measure mutations and frequency shifts.
To compare PoMo branch lengths to DNA substitution models use the tree length
measured in substitutions per site.
Total tree length (sum of branch lengths)
- measured in number of mutations and frequency shifts per site: 0.71200751
- measured in number of substitutions per site (divided by N^2): 0.00879022
Sum of internal branch lengths
- measured in mutations and frequency shifts per site: 0.01767814 (2.48285810% of tree length)
- measured in substitutions per site: 0.01767814 (2.48285810% of tree length)
Copyright (c) 2010-2016 IQ-TREE development team.
- First example
- Model selection
- New model selection
- Codon models
- Binary, Morphological, SNPs
- Ultrafast bootstrap
- Nonparametric bootstrap
- Single branch tests
- Partitioned analysis
- Partitioning with mixed data
- Partition scheme selection
- Bootstrapping partition model
- Utilizing multi-core CPUs
- Tree topology tests
- User-defined models
- Consensus construction and bootstrap value assignment
- Computing Robinson-Foulds distance
- Generating random trees
- DNA models
- Protein models
- Codon models
- Binary, morphological models
- Ascertainment bias correction
- Rate heterogeneity
- Counts files
- First running example
- Substitution models
- Virtual population size
- Sampling method
- Bootstrap branch support
- Interpretation of branch lengths