Skip to content

Latest commit

 

History

History
164 lines (114 loc) · 4.38 KB

README.md

File metadata and controls

164 lines (114 loc) · 4.38 KB

phylogemetric

A python library for calculating delta score (Holland et al. 2002) and Q-Residual (Gray et al. 2010) for phylogenetic data.

Build Status Coverage Status DOI License JOSS

Installation:

Installation is only a pip install away:

pip install phylogemetric

Usage: Command line

Basic usage:

> phylogemetric

usage: phylogemetric [-h] method filename

Calculate delta score for filename example.nex:

> phylogemetric delta example.nex

taxon1              0.2453
taxon2              0.2404
taxon3              0.2954
...

Calculate qresidual score for filename example.nex:

> phylogemetric qresidual example.nex

taxon1              0.0030
taxon2              0.0037
taxon3              0.0063
...

Note: to save the results to a file use shell piping e.g.:

> phylogemetric qresidual example.nex > qresidual.txt

Speeding things up by using multiple processes.

You can tell phylogemetric to use multiple cores with the -w/--workers argument:

> phylogemetric -w 4 qresidual example.nex

Usage: Library

Calculate scores:

from nexus import NexusReader
from phylogemetric import DeltaScoreMetric
from phylogemetric import QResidualMetric

# load data from a nexus file:
nex = NexusReader("filename.nex")
qres = QResidualMetric(nex.data.matrix)

# Or construct a data matrix directly: 

matrix = {
    'A': [
        '1', '1', '1', '1', '0', '0', '1', '1', '1', '0', '1', '1',
        '1', '1', '0', '0', '1', '1', '1', '0'
    ],
    'B': [
        '1', '1', '1', '1', '0', '0', '0', '1', '1', '1', '1', '1',
        '1', '1', '1', '0', '0', '1', '1', '1'
    ],
    'C': [
        '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '0',
        '0', '0', '0', '1', '0', '1', '1', '1'
    ],
    'D': [
        '1', '0', '0', '0', '0', '1', '0', '1', '1', '1', '1', '0',
        '0', '0', '0', '1', '0', '1', '1', '1'
    ],
    'E': [
        '1', '0', '0', '0', '0', '1', '0', '1', '0', '1', '1', '0',
        '0', '0', '0', '1', '1', '1', '1', '1'
    ],
}

delta = DeltaScoreMetric(matrix)

Class Methods:

m = DeltaScoreMetric(matrix)

# calculates the number of quartets in the data:

m.nquartets()

# returns the distance between two sequences:
m.dist(['1', '1', '0'], ['0', '1', '0'])

# gets a dictionary of metric scores:
m.score()
m.score(workers=4) # with multiple processes.


# pretty prints the metric scores:
m.pprint()

Requirements:

  • python-nexus >= 1.1

Performance Notes:

Currently phylogemetric is implemented in python, and the Delta/Q-Residual algorithms are O(n). This means that performance is not optimal, and it may take a while to calculate these metrics for datasets with more than 100 taxa or so. To help speed this up, use the multiple processes argument -w/--workers at the command line or by passing workers=n to the score function.

I hope to improve performance in the near future, but in the meantime, if this is an issue for you then try using the implementations available in SplitsTree.

Citation:

If you use phylogemetric, please cite:

Greenhill, SJ. 2016. Phylogemetric: A Python library for calculating phylogenetic network metrics. Journal of Open Source Software.
http://dx.doi.org/10.21105/joss.00028

Changelog:

  • 1.2.2: performance improvements
  • 1.2.1: bug fix
  • 1.2.0: performance improvements
  • 1.1.0:
  • Added support for multiple processes.
  • Removed python 2 support.

Acknowledgements:

  • Thanks to David Bryant for clarifying the Q-Residual code.
  • Thanks to Kristian Rother for code quality suggestions.