Skip to content

v1.0.0

Compare
Choose a tag to compare
@wsdewitt wsdewitt released this 01 Feb 20:08
· 8 commits to master since this release

Summary

This major version release brings several new features and API changes to improve usability, and accommodate demography-only analyses.

  • Inference for demography and mush is now done with two independent methods: kSFS.infer_eta and kSFS.infer_mush. See API docs for details.
  • Inference of ancestral state misidentification rate, for both sample frequency and mutation type, obviates frequency masking, which has been removed.
  • More interpretable model selection based on trend penalties. A user can supply as many trend penalties as they wish. A trend penalty of order k will encourage order k polynomial pieces in the solution (e.g. k=0 for piecewise constant). Trend penalties of different orders can be combined to get mixed trends. See API docs on the the inference methods above for details.
  • Simplified simulation notebook and updated Quickstart (see docs).

Under the hood

  • Rewrite of mushi.optimization module with abstraction and inheritance to avoid a lot of duplicated code. Added a trend filtering optimizer class based on the recursive ADMM of Ramdas and Tibshirani (this serves as the prox operator in the outer optimization routine when fitting demography or mush). I was able to get this running quite fast by caching Cholesky decompositions and using the fast prox-tv module for dual variable updates. I find that about 20 iterations of ADMM are plenty (although the default is 100).
  • Consolidated SFS folding code into a function in the utils model
  • Introduced a frequency misidentification operator and a mutation type misidentification operator to our model of the expected kSFS, and misidentification rate r that is a learned parameter for eta inference, and a vector of rates for MuSH inference.
  • It is no longer necessary to specify the time grid for inference. If no grid is specified, a reasonable default grid is constructed based on the TMRCA under a constant history. To specify the grid, use the parameters pts and ta when running kSFS.infer_eta().
  • Dedicated loss function model mushi.loss_functions. Inference can be done using any loss function from this module.