Skip to content
This repository has been archived by the owner on Feb 6, 2024. It is now read-only.

Advanced Tutorial

minh edited this page Apr 6, 2017 · 30 revisions

Advanced tutorial

Recommended for experienced users to explore more features.

Table of Contents

To get started, please read the Beginner's Tutorial first if not done so yet.

Partitioned analysis for multi-gene alignments

In the partition model, you can specify a substitution model for each gene/character set. IQ-TREE will then estimate the model parameters separately for every partition. Moreover, IQ-TREE provides edge-linked or edge-unlinked branch lengths between partitions:

  • -q partition_file: all partitions share the same set of branch lengths (like -q option of RAxML).
  • -spp partition_file: like above but allowing each partition to have its own evolution rate.
  • -sp partition_file: each partition has its own set of branch lengths (like combination of -q and -M options in RAxML) to account for, e.g. heterotachy (Lopez et al., 2002).

TIP: -spp is recommended for typical analysis. -q is unrealistic and -sp is very parameter-rich. One can also perform all three analyses and compare e.g. the BIC scores to determine the best-fit partition model.

IQ-TREE supports RAxML-style and NEXUS partition input file. The RAxML-style partition file may look like:

DNA, part1 = 1-100
DNA, part2 = 101-384

If your partition file is called example.partitions, the partition analysis can be run with:

iqtree -s example.phy -q example.partitions -m GTR+I+G

Note that using RAxML-style partition file, all partitions will use the same rate heterogeneity model given in -m option (+I+G in this example). If you want to specify, say, +G for the first partition and +I+G for the second partition, then you need to create the more flexible NEXUS partition file. This file contains a SETS block with CharSet and CharPartition commands to specify individual genes and the partition, respectively. For example:

#nexus
begin sets;
    charset part1 = 1-100;
    charset part2 = 101-384;
    charpartition mine = HKY+G:part1, GTR+I+G:part2;
end;

If your NEXUS file is called example.nex, then you can use the option -spp to input the file as following:

iqtree -s example.phy -spp example.nex

Here, IQ-TREE partitions the alignment example.phy into 2 sub-alignments named part1 and part2 containing sites (columns) 1-100 and 101-384, respectively. Moreover, IQ-TREE applies the subtitution models HKY+G and GTR+I+G to part1 and part2, respectively. Substitution model parameters and trees with branch lengths can be found in the result file example.nex.iqtree.

Moreover, the CharSet command allows to specify non-consecutive sites with e.g.:

charset part1 = 1-100 200-384;

That means, part1 contains sites 1-100 and 200-384 of the alignment. Another example is:

charset part1 = 1-100\3;

for extracting sites 1,4,7,...,100 from the alignment. This is useful for getting codon positions from the protein-coding alignment.

Partitioned analysis with mixed data

IQ-TREE also allows combining sub-alignments from different alignment files, which is helpful if you want to combine mixed data (e.g. DNA and protein) in a single analysis. Here is an example for mixing DNA, protein and codon models:

#nexus
begin sets;
    charset part1 = dna.phy: 1-100 201-300;
    charset part2 = dna.phy: 101-200;
    charset part3 = prot.phy: 1-400;
    charset part4 = prot.phy: 401-600;
    charset part5 = codon.phy:CODON, *;
    charpartition mine = HKY:part1, GTR+G:part2, LG+G:part3, WAG+I+G:part4, GY:part5;
end;

Here, part1 and part2 contain sub-alignments from alignment file dna.phy, whereas part3 and part4 are loaded from alignment file prot.phy and part5 from codon.phy. The : is needed to separate the alignment file name and site specification. Note that, for convenience * in part5 specification means that part5 corresponds to the entire alignment codon.phy.

NOTE: For part5 the CODON keyword is specified so that IQ-TREE will apply a codon model. Moreover, this implicitly assumes the standard genetic code. If you want to use another genetic code, append CODON with the code ID described here

Because the alignment file names are now specified in this NEXUS file, you can omit the -s option:

iqtree -sp example.nex

Note that aln.phy and prot.phy does not need to contain the same set of sequences. For instance, if some sequence occurs in aln.phy but not in prot.phy, IQ-TREE will treat the corresponding parts of sequence in prot.phy as missing data. For your convenience IQ-TREE writes the concatenated alignment into the file example.nex.conaln.

Choosing the right partitioning scheme

ModelFinder implements a greedy strategy (Lanfear et al., 2012) that starts with the full partition model and subsequentially merges two genes until the model fit does not increase any further:

# for IQ-TREE version >= 1.5.4:
iqtree -s example.phy -spp example.nex -m MFP+MERGE

# for IQ-TREE version <= 1.5.3:
iqtree -s example.phy -spp example.nex -m TESTNEWMERGE

Note that this command considers the FreeRate heterogeneity model (see model selection tutorial). If you want to resemble PartitionFinder by just considering the invariable site and Gamma rate heterogeneity (thus saving computation times), then run:

iqtree -s example.phy -spp example.nex -m TESTMERGE

After ModelFinder found the best partition, IQ-TREE will immediately start the tree reconstruction under the best-fit partition model. Sometimes you only want to find the best-fit partition model without doing tree reconstruction, then run:

# for IQ-TREE version >= 1.5.4:
iqtree -s example.phy -spp example.nex -m MF+MERGE

# for IQ-TREE version <= 1.5.3:
iqtree -s example.phy -spp example.nex -m TESTNEWMERGEONLY

# to resemble PartitionFinder and save time:
iqtree -s example.phy -spp example.nex -m TESTMERGEONLY

To reduce the computational burden IQ-TREE implements the relaxed hierarchical clustering algorithm (Lanfear et al., 2014), which is invoked via -rcluster option:

# for IQ-TREE version >= 1.5.4:
iqtree -s example.phy -spp example.nex -m MF+MERGE -rcluster 10

# for IQ-TREE version <= 1.5.3:
iqtree -s example.phy -spp example.nex -m TESTNEWMERGEONLY -rcluster 10

to only examine the top 10% partition merging schemes (similar to the --rcluster-percent 10 option in PartitionFinder).

Ultrafast bootstrapping with partition model

IQ-TREE can perform the ultrafast bootstrap with partition models by e.g.,

iqtree -s example.phy -spp example.nex -bb 1000

Here, IQ-TREE will resample the sites within subsets of the partitions (i.e., the bootstrap replicates are generated per subset separately and then concatenated together). The same holds true if you do the standard nonparametric bootstrap.

IQ-TREE supports the gene-resampling strategy:

iqtree -s example.phy -spp example.nex -bb 1000 -bspec GENE

to resample genes instead of sites. Moreover, IQ-TREE allows an even more complicated strategy: resampling genes and sites within resampled genes, which may reduce false positives of the standard bootstrap resampling (Gadagkar et al., 2005):

iqtree -s example.phy -spp example.nex -bb 1000 -bspec GENESITE

Constrained tree search

Starting with version 1.5.0, IQ-TREE supports constrained tree search via -g option, so that the resulting tree must obey a constraint tree topology. The constraint tree can be multifurcating and need not to contain all species. To illustrate, let's return to the first running example, where we want to force Human grouping with Seal whereas Cow with Whale. If you use the following constraint tree (NEWICK format):

((Human,Seal),(Cow,Whale));

Save this to a file example.constr0 and run:

iqtree -s example.phy -m TIM2+I+G -g example.constr0 -pre example.constr0

(We use a prefix in order not to overwrite output files of the previous run). The resulting part of the tree extracted from example.constr0.iqtree looks exactly like a normal unconstrained tree search:

        +--------------Human
     +--|
     |  |  +------Seal
     |  +--|
     |     |  +-----Cow
     |     +--|
     |        +-------Whale
+----|
|    |         +---Mouse
|    +---------|
|              +------Rat

This is the correct behavior: although Human and Seal are not monophyletic, this tree indeed satisfies the constraint, because the induced subtree separates (Human,Seal) from (Cow,Whale). This comes from the fact that the tree is unrooted. If you want them to be sister groups, then you need to include outgroup taxa into the constraint tree. For example:

((Human,Seal),(Cow,Whale),Mouse);

Save this to example.constr1 and run:

iqtree -s example.phy -m TIM2+I+G -g example.constr1 -pre example.constr1

The resulting part of the tree is then:

           +---------------Human
        +--|
        |  +------Seal
     +--|
     |  |  +-----Cow
     |  +--|
     |     +-------Whale
+----|
|    |         +---Mouse
|    +---------|
|              +------Rat

which shows the desired effect.

Tree topology tests

IQ-TREE can compute log-likelihoods of a set of trees passed via the -z option:

iqtree -s example.phy -z example.treels -m GTR+G

assuming that example.treels is an existing file containing a set of trees in NEWICK format. IQ-TREE first reconstructs an ML tree. Then, it will compute the log-likelihood of the trees in example.treels based on the estimated parameters done for the ML tree. example.phy.iqtree will have a section called USER TREES that lists the tree IDs and the corresponding log-likelihoods. The trees with optimized branch lengths can be found in example.phy.treels.trees If you only want to evaluate the trees without reconstructing the ML tree, you can run:

iqtree -s example.phy -z example.treels -n 0

Here, IQ-TREE performs a very quick tree reconstruction using only 1 iteration and uses that tree to estimate the model parameters, which are normally accurate enough for our purpose.

IQ-TREE also supports several tree topology tests using the RELL approximation (Kishino et al., 1990). This includes bootstrap proportion (BP), Kishino-Hasegawa test (Kishino and Hasegawa, 1989), Shimodaira-Hasegawa test (Shimodaira and Hasegawa, 1999), expected likelihood weights (Strimmer and Rambaut, 2002):

iqtree -s example.phy -z example.treels -n 1 -zb 1000

Here, -zb specifies the number of RELL replicates, where 1000 is the recommended minimum number. The USER TREES section of example.phy.iqtree will list the results of BP, KH, SH, and ELW methods.

If you also want to perform the weighted KH and weighted SH tests, simply add -zw option:

iqtree -s example.phy -z example.treels -n 1 -zb 1000 -zw

Starting with version 1.4.0 IQ-TREE supports approximately unbiased (AU) test (Shimodaira, 2002) via -au option:

iqtree -s example.phy -z example.treels -n 1 -zb 1000 -zw -au

This will perform all above tests plus the AU test.

Finally, note that IQ-TREE will automatically detect duplicated tree topologies and omit them during the evaluation.

Testing constrained tree

We now illustrate an example to use the AU test (see above) to test trees from unconstrained versus constrained search, which is helpful to know if a constrained search is sensible or not. Thus:

  1. Perform an unconstrained search:

     iqtree -s example.phy -m TIM2+I+G -pre example.unconstr
    
  2. Perform a constrained search, where example.constr1 file contains: ((Human,Seal),(Cow,Whale),Mouse);:

     iqtree -s example.phy -m TIM2+I+G -g example.constr1 -pre example.constr1
    
  3. Perform another constrained search, where example.constr2 file contains ((Human,Cow,Whale),Seal,Mouse);:

     iqtree -s example.phy -m TIM2+I+G -g example.constr2 -pre example.constr2
    
  4. Perform the last constrained search, where example.constr3 file contains ((Human,Mouse),(Cow,Rat),Opossum);:

     iqtree -s example.phy -m TIM2+I+G -g example.constr3 -pre example.constr3
    
  5. Concatenate all trees into a file:

     cat example.unconstr.treefile example.constr1.treefile example.constr2.treefile example.constr3.treefile > example.treels
    
  6. Test the set of trees:

     iqtree -s example.phy -m TIM2+I+G -z example.treels -n 0 -zb 1000 -au
    

Now look at the resulting .iqtree file:

USER TREES
----------

See example.phy.trees for trees with branch lengths.

Tree      logL    deltaL  bp-RELL    p-KH     p-SH    c-ELW     p-AU
-------------------------------------------------------------------------
  1   -21152.617   0.000  0.7110 + 0.7400 + 1.0000 + 0.6954 + 0.7939 + 
  2   -21156.802   4.185  0.2220 + 0.2600 + 0.5910 + 0.2288 + 0.3079 + 
  3   -21158.579   5.962  0.0670 + 0.1330 + 0.5130 + 0.0758 + 0.1452 + 
  4   -21339.596 186.980  0.0000 - 0.0000 - 0.0000 - 0.0000 - 0.0000 - 

deltaL  : logL difference from the maximal logl in the set.
bp-RELL : bootstrap proportion using RELL method (Kishino et al. 1990).
p-KH    : p-value of one sided Kishino-Hasegawa test (1989).
p-SH    : p-value of Shimodaira-Hasegawa test (2000).
c-ELW   : Expected Likelihood Weight (Strimmer & Rambaut 2002).
p-AU    : p-value of approximately unbiased (AU) test (Shimodaira, 2002).

Plus signs denote the 95% confidence sets.
Minus signs denote significant exclusion.
All tests performed 1000 resamplings using the RELL method.

One sees that the AU test does not reject the first 3 trees (denoted by + sign below the p-AU column), whereas the last tree is significantly excluded (- sign). All other tests also agree with this. Therefore, groupings of (Human,Mouse) and (Cow,Rat) do not make sense. Whereas the phylogenetic position of Seal based on 3 first trees is still undecidable. This is in agreement with the SH-aLRT and ultrafast bootstrap supports done in the Tutorial.

Consensus construction and bootstrap value assignment

IQ-TREE can construct an extended majority-rule consensus tree from a set of trees written in NEWICK or NEXUS format (e.g., produced by MrBayes):

iqtree -con mytrees

To build a majority-rule consensus tree, simply set the minimum support threshold to 0.5:

iqtree -con mytrees -minsup 0.5

If you want to specify a burn-in (the number of beginning trees to ignore from the trees file), use -bi option:

iqtree -con mytrees -minsup 0.5 -bi 100

to skip the first 100 trees in the file.

IQ-TREE can also compute a consensus network and print it into a NEXUS file by:

iqtree -net mytrees

Finally, a useful feature is to read in an input tree and a set of trees, then IQ-TREE can assign the support value onto the input tree (number of times each branch in the input tree occurs in the set of trees). This option is useful if you want to compute the support values for an ML tree based on alternative topologies.

iqtree -sup input_tree set_of_trees

User-defined substitution models

Users can specify any DNA model using a 6-letter code that defines which rates should be equal. For example, 010010 corresponds to the HKY model and 012345 to the GTR model. In fact, IQ-TREE uses this specification internally to simplify the coding. The 6-letter code is specified via the -m option, e.g.:

iqtree -s example.phy -m 010010+G

Moreover, with the -m option one can input a file which contains the 6 rates (A-C, A-G, A-T, C-G, C-T, G-T) and 4 base frequencies (A, C, G, T). For example:

iqtree -s example.phy -m mymodel+G

where mymodel is a file containing the 10 entries described above, in the correct order. The entries can be seperated by either empty space(s) or newline character. One can even specify the rates within -m option by e.g.:

iqtree -s example.phy -m 'TN{2.0,3.0}+G8{0.5}+I{0.15}'

That means, we use Tamura-Nei model with fixed transition-transversion rate ratio of 2.0 and purine/pyrimidine rate ratio of 3.0. Moreover, we use 8-category Gamma-distributed site rates with the shape parameter (alpha) equal to 0.5 and a proportion of invariable sites p-inv=0.15.

By default IQ-TREE computes empirical state frequencies from the alignment by counting, but one can also estimate the frequencies by maximum-likelihood with +Fo in the model name:

iqtree -s example.phy -m GTR+G+Fo

For amino-acid alignments, IQ-TREE use the empirical frequencies specified in the model. If you want frequencies as counted from the alignment, use +F, for example:

iqtree -s myprotein_alignment -m WAG+G+F

Note that all model specifications above can be used in the partition model NEXUS file.

Where to go from here?

See Command Reference for a complete list of all options available in IQ-TREE.

Clone this wiki locally