The Distributions package provides a collection of probabilistic distributions and related functions
Explore the docs »
Report Bug
·
Request Feature
·
Reference Manual
DISTRIBUTIONS is a library for (1) generating random draws from various commonly used distributions, and (2) calculating statistical functions, such as density, distribution and quantiles for these distributions.
In the implementation and the interface, our primary considerations are:
-
Correctness. Above everything, all calculations should be correct. Correctness shall not be sacrificed for speed or implementational simplicity. Consequently, everything should be unit-tested all the time.
-
Simple and unified interface. Random variables are instances which can be used for calculations and random draws. The naming convention for building blocks is
(draw|cdf|pdf|quantile|...)-(standard-)?distribution-name(possible-suffix)?
, egpdf-standard-normal
ordraw-standard-gamma1
, for example. -
Speed and exposed building blocks on demand. You can obtain the generator function for random draws as a closure using the accessor "generator" from an rv. In addition, the package exports independent building blocks such as draw-standard-normal, which can be inlined into your code if necessary.
Implementation note: Subclasses are allowed to calculate intermediate values (eg to speed up computation) any time, eg right after the initialization of the instance, or on demand. The consequences or changing the slots of RV classes are UNDEFINED, but probably quite nasty. Don't do it. Note: lazy slots are currently not used, will be reintroduced in the future after profiling/benchmarking.
- anaphora
- alexandria
- array-operations
- select
- let-plus
- numerical-utilities
- cephes
- special-functions
- let-plus
- float-features
To get a local copy up and running follow these steps:
An ANSI Common Lisp implementation. Developed and tested with SBCL.
Lisp-Stat is composed of several system that are designed to be
independently useful. So you can, for example, use distributions
for
any project needing to manipulate statistical distributions.
To make the system accessible to ASDF (a build facility, similar to make
in the C world), clone the repository in a directory ASDF knows about. By default the common-lisp
directory in your home directory is known. Create this if it doesn't already exist and then:
- Clone the repositories
cd ~/common-lisp && \
git clone https://github.com/Lisp-Stat/distributions.git && \
- Reset the ASDF source-registry to find the new system (from the REPL)
(asdf:clear-source-registry)
- Load the system
(ql:quickload :distributions)
This will download all of the dependencies for you.
To get the third party systems that Lisp-Stat depends on you can use a dependency manager, such as Quicklisp or CLPM Once installed, get the dependencies with either of:
(clpm-client:sync :sources "clpi") ;sources may vary
(ql:quickload :distributions)
You need do this only once. After obtaining the dependencies, you can
load the system with ASDF
: (asdf:load-system :distributions)
. If
you have installed the slime ASDF extensions, you can invoke this with
a comma (',') from the slime REPL in emacs.
Create a standard normal distribution
(defparameter *rv-normal* (distributions:r-normal))
and take a few draws from it:
LS-USER> (distributions:draw *rv-normal*)
1.037208743704438d0
LS-USER> (distributions:draw *rv-normal*)
-0.2847287516046668d0
LS-USER> (distributions:draw *rv-normal*)
-0.6793466378900889d0
LS-USER> (distributions:draw *rv-normal*)
1.5040711441992598d0
LS-USER>
For more examples, please refer to the Documentation.
- Sketch the interface.
- Extend basic functionality (see Coverage below)
- Keep extending the library based on user demand.
- Optimize things on demand, see where the bottlenecks are.
-
more serious testing. I like the approach in Cook (2006): we should transform empirical quantiles to z-statistics and calculate the p-value using chi-square tests
-
(mm rv x) and similar methods for multivariate normal (and maybe T)
See the open issues for a list of proposed features (and known issues).
Distribution | CDF | Quantile | Draw | Fit | |
---|---|---|---|---|---|
Bernoulli | N/A | N/A | N/A | Yes | No |
Beta | Yes | Yes | Yes | Yes | Yes |
Binomial | No | No | No | Yes | No |
Chi-Square | No | No | No | No | No |
Discrete | Yes | Yes | No | Yes | No |
Exponential | Yes | Yes | Yes | Yes | No |
Gamma | Yes | Yes | Yes | Yes | No |
Geometric | No | No | No | Yes | No |
Inverse-Gamma | Yes | No | No | Yes | No |
Log-Normal | Yes | Yes | Yes | Yes | No |
Normal | Yes | Yes | Yes | Yes | No |
Poisson | No | No | No | Yes | No |
Rayleigh | No | Yes | No | Yes | No |
Student t | No | No | No | Yes | No |
Uniform | Yes | Yes | Yes | Yes | No |
This system is part of the Lisp-Stat project; that should be your first stop for information. Also see the resources and community page for more information.
Always try to implement state-of-the-art generation and calculation methods. If you need something, read up on the literature, the field has developed a lot in the last decades, and most older books present obsolete methods. Good starting points are Gentle (2005) and Press et al (2007), though you should use the latter one with care and don't copy algorithms without reading a few recent articles, they are not always the best ones (the authors admit this, but they claim that some algorithms are there for pedagogical purposes).
Always document the references in the docstring, and include the full citation in doc/references.bib (BibTeX format).
Do at least basic optimization with declarations (eg until SBCL doesn't give a notes any more, notes about return values are OK). Benchmarks are always welcome, and should be documented.
Document doubts and suggestions for improvements, use !!
and ??
, more marks mean higher priority.
Please see CONTRIBUTING.md for details on the code of conduct, and the process for submitting pull requests.
Distributed under the MS-PL License. See LICENSE for more information.
Project Link: https://github.com/lisp-stat/distributions