NNI-Search Experiments

To view the results of the experiments, see the Jupyter notebooks plot_generalized_pruning_results.ipynb and plot_top_pruning_results.ipynb.

Dependencies and prerequisities

There are several dependencies and requirements before running an nni-search.

First, install bito: bito.

Second, update your bito environment:

conda activate bito
pip install -e .
conda env update --file environment.yml
conda activate bito

Performing an nni-search

For your first try, follow the instructions below and restrict to a top-pruning run on ds3 with uniform branch prior with 50 iterations. That will run in a short amount of time.

Preparing the data files

Various data files are required to run an nni-search. These files are prepared for the ds-datasets, assuming you have access to /fh/fast/matsen_e/shared/vip/ds-golden/, by running

./prep-ds-data.sh

This script prepares files based on Mr. Bayes runs with both a uniform or exponential branch prior on ds1, ds3, ds4, ds5, ds6, ds7, and ds8. You may want to edit the first few lines of the shell script to restrict to a subset of these runs. Prepping all datasets will take a few hours.

The files for the flu100 dataset are copied from vbpi-torch and already present in data/flu100.

Running the actual search

You can run the nni-search on these datasets and Mr Bayes posteriors with the script

./run-nni-search.sh

This script requires you have prepared the data files with prep-ds-data.sh. Again, you may want to edit the first few lines of the script to restrict to a subset. Scripts are available to run a single search on a single dataset, such as tpds1.sh and gpds1.sh.

Getting similar statistics for MCMC

If MrBayes is not installed, first install it. Comparison stats for short MCMC runs are generated by running the script

./prep-mb-data.sh

This script requires you have already prepared the data files with prep-ds-data.sh.

Getting stats for the empirical posterior

Additional stats (used for the credible subsplits plots) are generated by running the python script: python posterior_sdag_stats.py

Running a search with multiple starting trees (from the posterior).

To use the first few highest posterior density trees on the ds-datasets, run multiple_starts_all_ds.sh. For the flu100 data set, run multiple_starts_flu100.sh.

Running a search with multiple starting trees (from RAxML).

Install RAxML, if not already installed.

Get the trees from RAxML by excuting each of the run.sh scripts for the ds-datasets and flu100 in the directory multiple_trees_data/raxml_tests/, process the trees (for all data sets at once) by calling multiple_trees_data/count_distinct.py. and perform a search with one of the multiple_raxml_starts_ds#.sh. That is, run

cd multiple_trees_data/raxml_tests/ds1
./run.sh
cd ../ds3/
./run.sh
cd ../ds4/
./run.sh
cd ../ds5/
./run.sh
cd ../ds6/
./run.sh
cd ../ds7/
./run.sh
cd ../ds8/
./run.sh
cd ../flu100/
./run.sh
cd ..
python count_distinct.py
cd ../..
./multiple_raxml_starts_ds1.sh
./multiple_raxml_starts_ds3.sh
./multiple_raxml_starts_ds4.sh
./multiple_raxml_starts_ds5.sh
./multiple_raxml_starts_ds6.sh
./multiple_raxml_starts_ds7.sh
./multiple_raxml_starts_ds8.sh
./multiple_raxml_starts_flu100.sh

Run-time experiments.

To run a search with stats collection off and only computing run-time, use a script named something like time_tpds1.sh or time_gpds8.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
check_mb_sdag_timing		check_mb_sdag_timing
data		data
multiple_trees_data/raxml_tests		multiple_trees_data/raxml_tests
.gitignore		.gitignore
README.md		README.md
add_dummy_branch_lengths.py		add_dummy_branch_lengths.py
environment.yml		environment.yml
gpds1.sh		gpds1.sh
gpds3.sh		gpds3.sh
gpds4.sh		gpds4.sh
gpds5.sh		gpds5.sh
gpds6.sh		gpds6.sh
gpds7.sh		gpds7.sh
gpds8.sh		gpds8.sh
just_5_is_bad.py		just_5_is_bad.py
make_mb_file.py		make_mb_file.py
mb_comparison_stats.py		mb_comparison_stats.py
missing_edges_nni_example.py		missing_edges_nni_example.py
multiple_raxml_starts_ds1.sh		multiple_raxml_starts_ds1.sh
multiple_raxml_starts_ds3.sh		multiple_raxml_starts_ds3.sh
multiple_raxml_starts_ds4.sh		multiple_raxml_starts_ds4.sh
multiple_raxml_starts_ds5.sh		multiple_raxml_starts_ds5.sh
multiple_raxml_starts_ds6.sh		multiple_raxml_starts_ds6.sh
multiple_raxml_starts_ds7.sh		multiple_raxml_starts_ds7.sh
multiple_raxml_starts_ds8.sh		multiple_raxml_starts_ds8.sh
multiple_raxml_starts_flu100.sh		multiple_raxml_starts_flu100.sh
multiple_starts_all_ds.sh		multiple_starts_all_ds.sh
multiple_starts_flu100.sh		multiple_starts_flu100.sh
nni_search.py		nni_search.py
plot_generalized_pruning_results.ipynb		plot_generalized_pruning_results.ipynb
plot_results.ipynb		plot_results.ipynb
plot_top_pruning_results.ipynb		plot_top_pruning_results.ipynb
posterior_sdag_stats.py		posterior_sdag_stats.py
prep-ds-data.sh		prep-ds-data.sh
prep-mb-data.sh		prep-mb-data.sh
python_processing_of_mb_file.py		python_processing_of_mb_file.py
run-nni-search.sh		run-nni-search.sh
time_gpds1.sh		time_gpds1.sh
time_gpds3.sh		time_gpds3.sh
time_gpds4.sh		time_gpds4.sh
time_gpds5.sh		time_gpds5.sh
time_gpds6.sh		time_gpds6.sh
time_gpds7.sh		time_gpds7.sh
time_gpds8.sh		time_gpds8.sh
time_tpds1.sh		time_tpds1.sh
time_tpds3.sh		time_tpds3.sh
time_tpds4.sh		time_tpds4.sh
time_tpds5.sh		time_tpds5.sh
time_tpds6.sh		time_tpds6.sh
time_tpds7.sh		time_tpds7.sh
time_tpds8.sh		time_tpds8.sh
time_tpflu100.sh		time_tpflu100.sh
tpds1.sh		tpds1.sh
tpds3.sh		tpds3.sh
tpds4.sh		tpds4.sh
tpds5.sh		tpds5.sh
tpds6.sh		tpds6.sh
tpds7.sh		tpds7.sh
tpds8.sh		tpds8.sh
tpflu100.sh		tpflu100.sh
wtch-branch-optimization.py		wtch-branch-optimization.py
wtch-process-trprobs.py		wtch-process-trprobs.py
wtch-unpickle-cdf.py		wtch-unpickle-cdf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NNI-Search Experiments

Dependencies and prerequisities

Performing an nni-search

Preparing the data files

Running the actual search

Getting similar statistics for MCMC

Getting stats for the empirical posterior

Running a search with multiple starting trees (from the posterior).

Running a search with multiple starting trees (from RAxML).

Run-time experiments.

About

Releases

Packages

Contributors 2

Languages

matsengrp/sdag-nni-experiments

Folders and files

Latest commit

History

Repository files navigation

NNI-Search Experiments

Dependencies and prerequisities

Performing an nni-search

Preparing the data files

Running the actual search

Getting similar statistics for MCMC

Getting stats for the empirical posterior

Running a search with multiple starting trees (from the posterior).

Running a search with multiple starting trees (from RAxML).

Run-time experiments.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages