v1.0.0
Summary
This major version release brings several new features and API changes to improve usability, and accommodate demography-only analyses.
- Inference for demography and mush is now done with two independent methods:
kSFS.infer_eta
andkSFS.infer_mush
. See API docs for details. - Inference of ancestral state misidentification rate, for both sample frequency and mutation type, obviates frequency masking, which has been removed.
- More interpretable model selection based on trend penalties. A user can supply as many trend penalties as they wish. A trend penalty of order k will encourage order k polynomial pieces in the solution (e.g. k=0 for piecewise constant). Trend penalties of different orders can be combined to get mixed trends. See API docs on the the inference methods above for details.
- Simplified simulation notebook and updated Quickstart (see docs).
Under the hood
- Rewrite of
mushi.optimization
module with abstraction and inheritance to avoid a lot of duplicated code. Added a trend filtering optimizer class based on the recursive ADMM of Ramdas and Tibshirani (this serves as the prox operator in the outer optimization routine when fitting demography or mush). I was able to get this running quite fast by caching Cholesky decompositions and using the fast prox-tv module for dual variable updates. I find that about 20 iterations of ADMM are plenty (although the default is 100). - Consolidated SFS folding code into a function in the
utils
model - Introduced a frequency misidentification operator and a mutation type misidentification operator to our model of the expected kSFS, and misidentification rate
r
that is a learned parameter for eta inference, and a vector of rates for MuSH inference. - It is no longer necessary to specify the time grid for inference. If no grid is specified, a reasonable default grid is constructed based on the TMRCA under a constant history. To specify the grid, use the parameters
pts
andta
when runningkSFS.infer_eta()
. - Dedicated loss function model
mushi.loss_functions
. Inference can be done using any loss function from this module.