-
Notifications
You must be signed in to change notification settings - Fork 16
/
README.Rmd
107 lines (79 loc) · 5.09 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
set.seed(0)
```
# distributional
<!-- badges: start -->
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![R-CMD-check](https://github.com/mitchelloharawild/distributional/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mitchelloharawild/distributional/actions/workflows/R-CMD-check.yaml)
<!-- [![Coverage Status](https://codecov.io/gh/mitchelloharawild/distributional/branch/master/graph/badge.svg)](https://codecov.io/github/mitchelloharawild/distributional?branch=master) -->
[![CRAN status](https://www.r-pkg.org/badges/version/distributional)](https://CRAN.R-project.org/package=distributional)
![Download count](https://cranlogs.r-pkg.org/badges/last-month/distributional)
<!-- badges: end -->
The distributional package allows distributions to be used in a vectorised context. It provides methods which are minimal wrappers to the standard d, p, q, and r distribution functions which are applied to each distribution in the vector. Additional distributional statistics can be computed, including the `mean()`, `median()`, `variance()`, and intervals with `hilo()`.
The distributional nature of a model's predictions is often understated, with default output of prediction methods usually only producing point predictions. Some R packages (such as [forecast](https://CRAN.R-project.org/package=forecast)) further emphasise uncertainty by producing point forecasts and intervals by default, however the user's ability to interact with them is limited. This package vectorises distributions and provides methods for working with them, making distributions compatible with prediction outputs of modelling functions. These vectorised distributions can be illustrated with [ggplot2](https://CRAN.R-project.org/package=ggplot2) using the [ggdist](https://CRAN.R-project.org/package=ggdist) package, providing further opportunity to visualise the uncertainty of predictions and teach distributional theory.
## Installation
You can install the released version of distributional from [CRAN](https://CRAN.R-project.org/package=distributional) with:
```{r, eval = FALSE}
install.packages("distributional")
```
The development version can be installed from [GitHub](https://github.com/mitchelloharawild/distributional) with:
```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("mitchelloharawild/distributional")
```
## Examples
Distributions are created using `dist_*()` functions. A list of included distribution shapes can be found here: <https://pkg.mitchelloharawild.com/distributional/reference/>
```{r object}
library(distributional)
my_dist <- c(dist_normal(mu = 0, sigma = 1), dist_student_t(df = 10))
my_dist
```
The standard four distribution functions in R are usable via these generics:
```{r}
density(my_dist, 0) # c(dnorm(0, mean = 0, sd = 1), dt(0, df = 10))
cdf(my_dist, 5) # c(pnorm(5, mean = 0, sd = 1), pt(5, df = 10))
quantile(my_dist, 0.1) # c(qnorm(0.1, mean = 0, sd = 1), qt(0.1, df = 10))
generate(my_dist, 10) # list(rnorm(10, mean = 0, sd = 1), rt(10, df = 10))
```
You can also compute intervals using `hilo()`
```{r hilo}
hilo(my_dist, 0.95)
```
Additionally, some distributions may support other methods such as mathematical operations and summary measures. If the methods aren't supported, a transformed distribution will be created.
```{r math}
my_dist
my_dist*3 + 2
mean(my_dist)
variance(my_dist)
```
You can also visualise the distribution(s) using the [ggdist](https://mjskay.github.io/ggdist/) package.
```{r plot}
library(ggdist)
library(ggplot2)
df <- data.frame(
name = c("Gamma(2,1)", "Normal(5,1)", "Mixture"),
dist = c(dist_gamma(2,1), dist_normal(5,1),
dist_mixture(dist_gamma(2,1), dist_normal(5, 1), weights = c(0.4, 0.6)))
)
ggplot(df, aes(y = factor(name, levels = rev(name)))) +
stat_dist_halfeye(aes(dist = dist)) +
labs(title = "Density function for a mixture of distributions", y = NULL, x = NULL)
```
## Related work
There are several packages which unify interfaces for distributions in R:
* stats provides functions to work with possibly multiple distributions (comparisons made below).
* [distributions3](https://cran.r-project.org/package=distributions3) represents singular distributions using S3, with particularly nice documentation. This package makes use of some code and documentation from this package.
* [distr](https://cran.r-project.org/package=distr) represents singular distributions using S4.
* [distr6](https://cran.r-project.org/package=distr6) represents singular distributions using R6.
* Many more in the [CRAN task view](https://cran.r-project.org/view=Distributions)
This package differs from the above libraries by storing the distributions in a vectorised format. It does this using [vctrs](https://vctrs.r-lib.org/), so it should play nicely with the tidyverse (try putting distributions into a tibble!).