To view the results of the experiments, see the Jupyter notebooks
plot_generalized_pruning_results.ipynb
and plot_top_pruning_results.ipynb
.
There are several dependencies and requirements before running an nni-search.
First, install bito: bito.
Second, update your bito environment:
conda activate bito
pip install -e .
conda env update --file environment.yml
conda activate bito
For your first try, follow the instructions below and restrict to a top-pruning run on ds3 with uniform branch prior with 50 iterations. That will run in a short amount of time.
Various data files are required to run an nni-search. These files are prepared for the ds-datasets,
assuming you have access to /fh/fast/matsen_e/shared/vip/ds-golden/
, by running
./prep-ds-data.sh
This script prepares files based on Mr. Bayes runs with both a uniform or exponential branch prior on ds1, ds3, ds4, ds5, ds6, ds7, and ds8. You may want to edit the first few lines of the shell script to restrict to a subset of these runs. Prepping all datasets will take a few hours.
The files for the flu100 dataset are copied from
vbpi-torch and already
present in data/flu100
.
You can run the nni-search on these datasets and Mr Bayes posteriors with the script
./run-nni-search.sh
This script requires you have prepared the data files with prep-ds-data.sh
.
Again, you may want to edit the first few lines of the script to restrict to a subset.
Scripts are available to run a single search on a single dataset, such as tpds1.sh
and gpds1.sh
.
If MrBayes is not installed, first install it. Comparison stats for short MCMC runs are generated by running the script
./prep-mb-data.sh
This script requires you have already prepared the data files with prep-ds-data.sh
.
Additional stats (used for the credible subsplits plots) are generated by running the
python script: python posterior_sdag_stats.py
To use the first few highest posterior density trees on the ds-datasets, run multiple_starts_all_ds.sh
.
For the flu100 data set, run multiple_starts_flu100.sh
.
Install RAxML, if not already installed.
Get the trees from RAxML by excuting each of the run.sh
scripts for the ds-datasets
and flu100 in the directory multiple_trees_data/raxml_tests/
, process the trees (for
all data sets at once) by calling multiple_trees_data/count_distinct.py
.
and perform a search with one of the multiple_raxml_starts_ds#.sh
.
That is, run
cd multiple_trees_data/raxml_tests/ds1
./run.sh
cd ../ds3/
./run.sh
cd ../ds4/
./run.sh
cd ../ds5/
./run.sh
cd ../ds6/
./run.sh
cd ../ds7/
./run.sh
cd ../ds8/
./run.sh
cd ../flu100/
./run.sh
cd ..
python count_distinct.py
cd ../..
./multiple_raxml_starts_ds1.sh
./multiple_raxml_starts_ds3.sh
./multiple_raxml_starts_ds4.sh
./multiple_raxml_starts_ds5.sh
./multiple_raxml_starts_ds6.sh
./multiple_raxml_starts_ds7.sh
./multiple_raxml_starts_ds8.sh
./multiple_raxml_starts_flu100.sh
To run a search with stats collection off and only computing run-time, use
a script named something like time_tpds1.sh
or time_gpds8.sh
.