In in silico evolution experiments, we have the luxury of being able to perfectly track the phylogenies of our populations, rather than having to just infer them after the fact. Phylotrackpy is a Python package designed to help you do so as efficiently as possible.
At face value, measuring a phylogeny in in silico evolution may seem very straightforward: you just need to keep track of what gives birth to what. However, multiple aspects turn out to be non-trivial. The goal of Phylotrackpy is to implement these things the right way once so that we all can stop needing to re-implement them over and over. Phylotrackpy is a python library designed to flexibly handle all aspects of recording phylogenies in in silico evolution.
Phylotrackpy is essentially a wrapper around Phylotracklib, which is implemented in C++. If you need a C++ phylogeny tracker, you can use that one directly (it is part of the larger Empirical library, which is header-only so you can just include the parts you want).
A phylogeny tracker written in Julia is also available.
- Pruning: Ability to prune out taxa that are extinct and have no extant descendants (to keep memory use under control)
- Flexible taxon definitions: Flexible control of how taxa are defined (e.g. by genotype, by phenotype, by trait, or by something more complex)
- Efficiency: Highly efficient (implemented in C++ under the hood)
- Phylostatistics: Includes various phylogenetic topology metrics
- Flexible output: Easily add columns to output files.
Running a parallel/distributed simulation? Check out hstrat, which provides an alternate parallel/distributed-friendly methodology for decentralized phylogenetic tracking.
There are three main steps to tracking a phylogeny using phylotrackpy:
- Construct a
Systematics
object - Notify the systematics object whenever anything is born
- Notify the systematics object whenever anything dies
You may also want to:
For more detailed instructions, see the documentation
Phylotrackpy is available through pip:
pip install phylotrackpy
To install the latest development version:
pip install git+https://github.com/emilydolson/phylotrackpy
To install from a local sorce copy:
pip install . --upgrade
Note that development and local installs will require local compilation of C++ bindings. Pre-built wheels are available with the PyPi distribution. See our documentation for more complete information on local builds.
There are certain quirks associated with real-time phylogenies (especially digital ones) that you might not be used to thinking about if you're used to dealing with reconstructed phylogenies. Many of these discrepancies are the result of the very different temporal resolutions on which these types of phylogenies are measured, and the fact that the taxonomic units we work with are often at a finer resolution than species. We document some here so that they don't catch you off guard:
- Multifurcations are real: In phylogenetic reconstructions, there is usually an assumption that any multifurcation/polytomy (i.e. a node that has more than two child nodes) is an artifact of having insufficient data. In real-time phylogenies, however, we often observe multifurcations that we know for sure actually happened.
- Not all extant taxa are leaf nodes: In phylogenetic reconstructions, there is usually an assumption that all extant (i.e. still living) taxa are leaf nodes in the phylogeny (i.e. none of them are parents/offspring of each other; similar taxa are descended from a shared common ancestor). In real-time phylogenies it is entirely possible that one taxon gives birth to something that we have defined as a different taxon and then continues to coexist with that child taxon.
- Not all nodes are branch points: In phylogenetic reconstructions, we only attempt to infer where branch points (i.e. common ancestors of multiple taxa) occurred. We do not try to infer how many taxa existed on a line of descent between a branch point and an extant taxa. In real-time phylogenies we observe exactly how many taxa exist on this line of descent and we keep a record of them. In practice there are often a lot of them, depending on you define your taxa. It is unclear whether we should include these non-branching nodes when calculating phylogenetic statistics (which is why Phylotrackpy lets you choose whether you want to).
The above image represents an actual phylogeny measured from digital evolution. Each rectangle represents a different taxon. It's position along the x axis represents the span of time it existed for. Note that there are often sections along a single branch where multiple taxa coexisted for a period of time. Circles represent extant taxa at the end of this run.
- pytest
- myst_parser (for writing documentation in markdown)
- sphinx_rtd_theme (theme for readthedocs)
Contributions are welcome! See CONTRIBUTING.md.
If Phylotrack contributes to a scientific publication, please cite it as
Dolson, E., Rodriguez-Papa, S., & Moreno, M. A. (2024). Phylotrack: C++ and Python libraries for in silico phylogenetic tracking. arXiv preprint arXiv:2405.09389. https://doi.org/10.48550/arXiv.2405.09389
@misc{dolson2024phylotrack,
doi={10.48550/arXiv.2405.09389},
url={https://arxiv.org/abs/2405.09389},
title={Phylotrack: C++ and Python libraries for in silico phylogenetic tracking},
author={Emily Dolson and Santiago Rodriguez-Papa and Matthew Andres Moreno},
year={2024},
eprint={2405.09389},
archivePrefix={arXiv},
primaryClass={q-bio.PE}
}
Consider also citing pybind11 if you are using PhylotrackPy. And don't forget to leave a star on GitHub!
- Emily Dolson (lead developer)
- Matthew Andres Moreno