5.7.1: bug fixed on reported quartet score for multiind

smirarab · Dec 18, 2019 · 658ff3f · 658ff3f
1 parent 38fde4b
commit 658ff3f
Show file tree

Hide file tree

Showing 6 changed files with 53 additions and 41 deletions.
diff --git a/Astral.5.7.0.zip → Astral.5.7.1.zip b/Astral.5.7.0.zip → Astral.5.7.1.zip
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,6 @@
+- version 5.7.1:
+	- Bug fix: `-t 0`, `-t 2`,  `-t 4`, and  `-t 8` reported wrong quartet score for multi-ind datasets. Fixed to give a warning instead of wrong score. 
+
 - version 5.7.0:
 	- **Important Bug Fix**: The normalized quartet score was incorrect for multi-individual gene trees with polytomies. Absolute quartet score was correct but normalizing factor was not
 

diff --git a/README.md b/README.md
@@ -77,52 +77,52 @@ INSTALLATION:
 -----------
 * There is no installation required to run ASTRAL.
 * Download using one of two approaches:
-    * You simply need to download the [zip file](https://github.com/smirarab/ASTRAL/raw/master/Astral.5.7.0.zip) and extract the contents to a folder of your choice. 
+    * You simply need to download the [zip file](https://github.com/smirarab/ASTRAL/raw/master/Astral.5.7.1.zip) and extract the contents to a folder of your choice. 
     * Alternatively, you can clone the [github repository](https://github.com/smirarab/ASTRAL/). You then run `make.sh` to build the project or simply uncompress the zip file that is included with the repository.
 * ASTRAL is a java-based application, and should run in any environment (Windows, Linux, Mac, etc.) as long as java is installed. 
   Java 1.5 or later is required. We have tested ASTRAL only on Linux and MAC.
 * To test your installation, go to the place where you put the uncompressed ASTRAL, and run:
 
   ``` bash
-   java -jar astral.5.7.0.jar -i test_data/song_primates.424.gene.tre
+   java -jar astral.5.7.1.jar -i test_data/song_primates.424.gene.tre
    ```
 
   This should quickly finish. There are also other sample input files under `test_data/` that can be used.
 
 * ASTRAL can be run from any directory (e.g., `/path/to/astral/`). Then, you just need to run:
 
   ``` bash
-  java -jar /path/to/astral/astral.5.7.0.jar
+  java -jar /path/to/astral/astral.5.7.1.jar
   ```
 
-* Also, you can move `astral.5.7.0.jar` to any location you like and run it from there, but note that you need to move the `lib` directory with it as well.
+* Also, you can move `astral.5.7.1.jar` to any location you like and run it from there, but note that you need to move the `lib` directory with it as well.
 
 
 EXECUTION:
 -----------
 ASTRAL currently has no GUI. You need to run it through the command-line. In a terminal, go the location where you have downloaded the software, and issue the following command:
 
 ```
-  java -jar astral.5.7.0.jar
+  java -jar astral.5.7.1.jar
 ```
 
 This will give you a list of options available in ASTRAL.
 
 To find the species tree given a set of gene trees in a file called `in.tree`, use:
 
 ```
-java -jar astral.5.7.0.jar -i in.tree
+java -jar astral.5.7.1.jar -i in.tree
 ```
 
 The results will be outputted to the standard output. To save the results in a file use the `-o` option (**Strongly recommended**):
 
 ```
-java -jar astral.5.7.0.jar -i in.tree -o out.tre
+java -jar astral.5.7.1.jar -i in.tree -o out.tre
 ```
 To save the logs (**also recommended**), run:
 
 ```
-java -jar astral.5.7.0.jar -i in.tree -o out.tre 2>out.log
+java -jar astral.5.7.1.jar -i in.tree -o out.tre 2>out.log
 ```
 
 ###### Input: 
@@ -155,7 +155,7 @@ Please refer to the [tutorial](astral-tutorial.md) for all other features, inclu
 For big datasets (say more than 5000 taxa), increasing the memory available to Java can result in speedups. Note that you should give Java only as much free memory as you have available on your machine. So, for example, if you have 3GB of free memory, you can invoke ASTRAL using the following command to make all the 3GB available to Java:
 
 ```
-java -Xmx3000M -jar astral.5.7.0.jar -i in.tree
+java -Xmx3000M -jar astral.5.7.1.jar -i in.tree
 ```
 
 Acknowledgment

diff --git a/astral-tutorial.md b/astral-tutorial.md
@@ -50,7 +50,7 @@ ASTRAL currently has no GUI. You need to run it through command-line.
  To see the help, issue the following command:
 
 ```
-  java -jar astral.5.7.0.jar
+  java -jar astral.5.7.1.jar
 ```
 
 This will print the list of options available in ASTRAL. If no errors are printed, your ASTRAL installation is fine and you can proceed to the next sections. 
@@ -60,13 +60,13 @@ This will print the list of options available in ASTRAL. If no errors are printe
 We will next run ASTRAL on an input dataset. From the ASTRAL directory, run:
 
 ```
-java -jar astral.5.7.0.jar -i test_data/song_mammals.424.gene.tre
+java -jar astral.5.7.1.jar -i test_data/song_mammals.424.gene.tre
 ```
 
 The results will be outputted to the standard output. To save the results in an output file use the `-o` option:
 
 ```
-java -jar astral.5.7.0.jar -i test_data/song_mammals.424.gene.tre -o test_data/song_mammals.tre
+java -jar astral.5.7.1.jar -i test_data/song_mammals.424.gene.tre -o test_data/song_mammals.tre
 ```
 
 Here, the main input is just a file that contains all the input gene trees in Newick format. The input gene trees are treated as unrooted, whether or not they have a root. Note that the **output of ASTRAL should also be treated as an unrooted tree**. 
@@ -79,7 +79,7 @@ The input gene trees can have polytomies (unresolved branches) since [version 4.
 We will now run ASTRAL on a larger dataset. Run:
 
 ```
-java -jar astral.5.7.0.jar -i test_data/100-simulated-boot
+java -jar astral.5.7.1.jar -i test_data/100-simulated-boot
 ```
 
 The input file here is a simulated dataset with 100 sequences and 100 replicates of bootstrapped gene trees for 25 loci (thus 2,500 input trees). Note that ASTRAL finishes on this dataset in a matter of seconds. 
@@ -88,7 +88,7 @@ A larger real dataset from the [1kp](http://www.pnas.org/content/early/2014/10/2
 424 genes from 103 species. Run:
 
 ```
-java -jar astral.5.7.0.jar -i test_data/1KP-genetrees.tre -o test_data/1kp.tre 2> test_data/1kp.log
+java -jar astral.5.7.1.jar -i test_data/1KP-genetrees.tre -o test_data/1kp.tre 2> test_data/1kp.log
 ```
 
 This takes about a minute to run on a laptop. On this dataset, notice in the ASTRAL log information that it originally starts with 11043 clusters in its search space, and using heuristics implemented in ASTRAL-II, it increases the search space slightly to 11085 clusters. For more challenging datasets (i.e., more discordance or fewer genes) this number might increase a lot. 
@@ -107,7 +107,7 @@ nw_ed  1KP-genetrees.tre 'i & b<=10' o > 1KP-genetrees-BS10.tre
 To create a file `1KP-genetrees-BS10.tre` that includes the 1KP dataset with branches of 10% support or lower contracted. If you don't have newick utilities, don't worry. The contracted file is part of the ASTRAL distribution. 
 
 ```
-java -jar astral.5.7.0.jar -i test_data/1KP-genetrees-BS10.tre -o test_data/1kp-BS10.tre 2> test_data/1kp-bs10.log
+java -jar astral.5.7.1.jar -i test_data/1KP-genetrees-BS10.tre -o test_data/1kp-BS10.tre 2> test_data/1kp-bs10.log
 ```
 
 Compare the species tree generated here with that generated with the fully resolved gene trees. You can confirm that the tree topology has not changed in this case, but the branch lengths and the branch support have all changed (and that they tend to both increase). By comparing the log files you can also see that after contracting low support branches, the normalized quartet score increases to 0.92321 (from 0.89467 with no contraction). This is expected as low support branches tend to increase not decrease discordance. 
@@ -166,7 +166,7 @@ ASTRAL outputs lots of useful information to your screen ([stderr](https://en.wi
 by directing your stderr to a file. Capturing the log is highly recommended. Here is how you would capture stderr:
 
 ```
-java -jar astral.5.7.0.jar -i test_data/song_mammals.424.gene.tre -o test_data/song_mammals.tre 2> song_mammals.log
+java -jar astral.5.7.1.jar -i test_data/song_mammals.424.gene.tre -o test_data/song_mammals.tre 2> song_mammals.log
 ```
 
 Here are some of the important information captured in the log:
@@ -187,7 +187,7 @@ You can use the `-q` option in ASTRAL to score an existing species tree to produ
 To score a tree using ASTRAL, run:
 
 ```
-java -jar astral.5.7.0.jar -q test_data/simulated_14taxon.default.tre -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_scored.tre 2> test_data/simulated_scored.log
+java -jar astral.5.7.1.jar -q test_data/simulated_14taxon.default.tre -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_scored.tre 2> test_data/simulated_scored.log
 ```
 
 This will score the species tree given in `test_data/simulated_14taxon.default.tre` with respect to the gene trees given in `test_data/simulated_14taxon.gene.tre`. It will output the following in the log:
@@ -237,16 +237,16 @@ Here is a description of various information that can be turned on by using `-t`
 Run:
 
 ```
-java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 2 -o test_data/1kp-scored-t2.tre
+java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 2 -o test_data/1kp-scored-t2.tre
 ```
 ```
-java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 4 -o test_data/1kp-scored-t4.tre
+java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 4 -o test_data/1kp-scored-t4.tre
 ```
 ```
-java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 8 -o test_data/1kp-scored-t8.tre
+java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 8 -o test_data/1kp-scored-t8.tre
 ```
 ```
-java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 10 -o test_data/1kp-scored-t8.tre
+java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 10 -o test_data/1kp-scored-t8.tre
 ```
 read all the values given for a couple of branches and try to make sense of them. 
 
@@ -258,11 +258,11 @@ Our calculations of the local posterior probabilities and branch lengths use a Y
 Run the following two commands and compare the lengths of the longest branches:
 
 ```
-java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -c 2 -o test_data/1kp-scored-c2.tre
+java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -c 2 -o test_data/1kp-scored-c2.tre
 ```
 
 ```
-java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -c 0.001 -o test_data/1kp-scored-cs.tre
+java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -c 0.001 -o test_data/1kp-scored-cs.tre
 ``` 
 
 Note that setting lambda to 0 results in reporting ML estimates of the branch lengths instead of MAP. However, for branches with no discordance, we cannot compute a branch lengths. For these, we currently arbitrarily set ML to 10 coalescent units (we might change this in future versions).
@@ -284,7 +284,7 @@ To start multi-locus bootstrapping using ASTRAL, you need to provide the locatio
 * Now run:
 
 ```
-java -jar ../astral.5.7.0.jar -i song_mammals.424.gene.tre -b bs-files -o song_mammals.bootstrapped.astral.tre
+java -jar ../astral.5.7.1.jar -i song_mammals.424.gene.tre -b bs-files -o song_mammals.bootstrapped.astral.tre
 ```
 
 This will run 100 replicates of bootstrapping in addition to one run of ASTRAL on the main trees. 
@@ -318,7 +318,7 @@ By default, ASTRAL performs 100 bootstrap replicates, but the `-r` option can be
 For example, 
 
 ```
-java -jar ../astral.5.7.0.jar -i song_mammals.424.gene.tre -b bs-files -r 150 -o song_mammals.bootstrapped.150.astral.tre
+java -jar ../astral.5.7.1.jar -i song_mammals.424.gene.tre -b bs-files -r 150 -o song_mammals.bootstrapped.150.astral.tre
 ```
 
 will do 150 replicates. Note that your input gene tree bootstrap files need to have enough bootstrap replicates for the number of replicates requested using `-r`. For example, if you have `-r 150`, each file listed in `bs-files` should contain at least 150 bootstrap replicates.
@@ -329,7 +329,7 @@ will do 150 replicates. Note that your input gene tree bootstrap files need to h
 ASTRAL performs site-only resampling by default (see [Seo, 2008](http://www.ncbi.nlm.nih.gov/pubmed/18281270)). ASTRAL can also perform gene+site resampling, which can be activated with the `-g` option:
 
 ```
-java -jar ../astral.5.7.0.jar -i song_mammals.424.gene.tre -b bs-files -g -r 100 -o song_mammals.bootstrapped.gs.astral.tre
+java -jar ../astral.5.7.1.jar -i song_mammals.424.gene.tre -b bs-files -g -r 100 -o song_mammals.bootstrapped.gs.astral.tre
 ```
 
 Note that when you perform gene/site resampling, you need more gene tree replicates than the number of multi-locus bootstrapping replicates you requested using `-r`. For example, if you have `-g -r 100`, you might need 150 replicates for some genes (and less than 100 replicates for other genes). This is because when genes are resampled, some genes will be sampled more often than others by chance.
@@ -338,7 +338,7 @@ Note that when you perform gene/site resampling, you need more gene tree replica
 ASTRAL can also perform gene-only bootstrapping using the `--gene-only` option. This form of bootstrapping requires only one input file, which is given using `-i`. Thus, for this, you don't need to use `-b`. The following performs bootstrapping by resampling genes in the input file:
 
 ```
-java -jar ../astral.5.7.0.jar -i song_mammals.424.gene.tre --gene-only -o song_mammals.bootstrapped.go.astral.tre
+java -jar ../astral.5.7.1.jar -i song_mammals.424.gene.tre --gene-only -o song_mammals.bootstrapped.go.astral.tre
 ```
 
 
@@ -354,13 +354,13 @@ ASTRAL has an exact and a heuristic version. The heuristic version solves the op
 Since the mammalian dataset we have used so far has 37 taxa, the exact version cannot run on it. However, we have created a subset of this dataset that has all 9 primates, tree shrew, rat, rabbit, horse, and the sloth (a total of 14 taxa). We can run the exact version of ASTRAL on this reduced dataset. Run:
 
 ```
-java -jar astral.5.7.0.jar -i test_data/song_primates.424.gene.tre -o test_data/song_primates.424.exact.tre -x
+java -jar astral.5.7.1.jar -i test_data/song_primates.424.gene.tre -o test_data/song_primates.424.exact.tre -x
 ```
 
 Using the `-x` option results in running the exact version of the ASTRAL algorithm. This run should finish in about 30 seconds. Now, we will run ASTRAL on the same input using the default heuristic algorithm:
 
 ```
-java -jar astral.5.7.0.jar -i test_data/song_primates.424.gene.tre -o test_data/song_primates.424.default.tre
+java -jar astral.5.7.1.jar -i test_data/song_primates.424.gene.tre -o test_data/song_primates.424.default.tre
 ```
 This time, ASTRAL finished in under a second. So, is there a difference between the output of the exact and the heuristic version? Open up the two trees in your tree viewer tool and compare them. You will notice they are identical. You could also compare the scores outputted by ASTRAL and notice that they are identical. 
 
@@ -376,13 +376,13 @@ We tried hard to find a subset of genes in the biological primates dataset where
 Run
 
 ```
-java -jar astral.5.7.0.jar -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_14taxon.default.tre
+java -jar astral.5.7.1.jar -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_14taxon.default.tre
 ```
 
 and then
 
 ```
-java -jar astral.5.7.0.jar -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_14taxon.exact.tre -x
+java -jar astral.5.7.1.jar -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_14taxon.exact.tre -x
 ```
 
 Now you see that the tree outputted by the exact version has a slightly higher score (4812=48.07% versus 4803=47.98%), and a slightly different topology compared to the heuristic version. Thus, in extreme cases (i.e., lots of ILS and/or gene tree estimation error and few available gene trees compared to the number of taxa), one could observe differences between the exact and heuristic versions. Note that how many genes should be considered few depends on the number of taxa you have, and also how much missing data there is. 
@@ -401,7 +401,7 @@ impact on running time.
 To expand the search space, you can run:
 
 ```
-java -jar astral.5.7.0.jar -i test_data/simulated_primates_5X.10.gene.tre -o test_data/simulated_primates_5X.10.species.tre -e test_data/simulated_primates_5X.10.bootstrap.gene.tre
+java -jar astral.5.7.1.jar -i test_data/simulated_primates_5X.10.gene.tre -o test_data/simulated_primates_5X.10.species.tre -e test_data/simulated_primates_5X.10.bootstrap.gene.tre
 ```
 Here, the `-e` option is used to input a set of extra trees that ASTRAL uses to expand its search space. The file provided simply has 200 bootstrap replicates for each of the these 10 simulated genes.
 A similar option `-f` can be used when input trees have species labels instead of gene labels (only consequential when for multi-individual datasets).
@@ -415,7 +415,7 @@ Miscellaneous
 For big datasets (say more than 500 taxa) increasing the memory available to java might be necessary. Note that you should never give java more memory than what you have available on your machine. So, for example, if you have 4GB of free memory, you can invoke ASTRAL using the following command to make 3GB available to java:
 
 ```
-java -Xmx3000M -jar astral.5.7.0.jar -i in.tree
+java -Xmx3000M -jar astral.5.7.1.jar -i in.tree
 ```
 
 ### Other options

diff --git a/main/phylonet/coalescent/CommandLine.java b/main/phylonet/coalescent/CommandLine.java
@@ -40,7 +40,7 @@
 import com.martiansoftware.jsap.stringparsers.FileStringParser;
 
 public class CommandLine{
-    protected static String _versinon = "5.7.0";
+    protected static String _versinon = "5.7.1";
 
     protected static SimpleJSAP jsap;
 

diff --git a/main/phylonet/coalescent/WQInference.java b/main/phylonet/coalescent/WQInference.java
@@ -211,7 +211,7 @@ public double scoreSpeciesTreeWithGTLabels(Tree st, boolean initialize) {
 
 		Stack<STITreeCluster> stack = new Stack<STITreeCluster>();
 		long sum = 0l;
-
+		boolean poly = false;
 		for (TNode node: st.postTraverse()) {
 			if (node.isLeaf()) {
 				String nodeName = node.getName(); //GlobalMaps.TaxonNameMap.getSpeciesName(node.getName());
@@ -248,26 +248,35 @@ public double scoreSpeciesTreeWithGTLabels(Tree st, boolean initialize) {
 					}
 					System.err.println(" (polytomy)");*/
 					if (this.getBranchAnnotation() % 2 == 0) {
+						poly = true;
 						continue;
 					}
 				}
 
 				for (int i = 0; i < childbslist.size(); i++) {
 					for (int j = i+1; j < childbslist.size(); j++) {
 						for (int k = j+1; k < childbslist.size(); k++) {
-							Tripartition trip = new Tripartition(childbslist.get(i),  childbslist.get(j), childbslist.get(k));
-							Long s = weightCalculator.getWeight(trip, null);
-							sum += s;
+							sum += weightCalculator.getWeight(
+									new Tripartition(childbslist.get(i),  childbslist.get(j), childbslist.get(k)), 
+									null);
 						}
 					}					       
 				}
 			}
 		}
 
 
-		System.err.println("Final quartet score is: " + sum/4l);
-		System.err.println("Final normalized quartet score is: "+ (sum/4l+0.)/this.maxpossible);
-		//System.out.println(st.toNewickWD());
+		if (poly) {
+			System.err.println("Final quartet score is: won't report because of the existense of polytomies and to save time. "
+					+ "To get the score run with -t 1 and you can score the tree below using -q. ");
+			System.err.println("Final normalized quartet score is: won't report because of the existense of polytomies and to save time. "
+					+ "To get the score run with -t 1 and you can score the tree below using -q. ");
+		} else {
+
+			System.err.println("Final quartet score is: " + sum/4l);
+			System.err.println("Final normalized quartet score is: "+ (sum/4l+0.)/this.maxpossible);
+			//System.out.println(st.toNewickWD());
+		}
 
 		if (this.getBranchAnnotation() == 0){
 			for (TNode n: st.postTraverse()) {