Skip to content

Latest commit

 

History

History
23 lines (12 loc) · 5.02 KB

pandoc3b3365537fe2.md

File metadata and controls

23 lines (12 loc) · 5.02 KB

Bayesian Inference

Besides the frequentist framework, bayesian inference is another important strand of inference statistics. Although the frequentist approach is arguably used more often in most research fields, bayesian statistics is actually older, and despite being increasingly picked up by researchers in recent years [@marsman2017bayesian], has never really been "gone" [@mcgrayne2011theory]. The central idea of bayesian statistics revolves around Bayes' Theorem, first formulated by Thomas Bayes (1701-1761). Bayesian statistics differs from frequentism because it also incorporates "subjective" prior knowledge into statistical inference. In particular, Bayes' theorem allows us to estimate the probability of an event $A$, given that we already know that another event $B$ has occurred, or is the case. This is a conditional probability, which can be expressed like this in mathematical notation: $P(A|B)$. Bayes' theorem then looks like this:

$$P(A|B)=\frac{P(B|A)\times P(A)}{P(B)}$$

The two probabilities in the upper part of the fraction to the right have their own names. First, the $P(B|A)$ part is (essentially) known as the likelihood; in our case, this is the probability of event $B$ (which has already occurred), given that $A$ is in fact the case, or occurs [@etz2018introduction]. $P(A)$ is the "prior" probability of $A$ occurring. $P(A|B)$, lastly, is the "posterior" probability. Given that $P(B)$ is a fixed constant, the formula above is often shortened to this form:

$$P(A|B) \propto P(B|A)\times P(A)$$

Where the $\propto$ sign means that now that we have discarded the denominator of the fraction, the probability on the left remains at least "proportional" to the part on the right with changing values [@shim2019network]. We may understand Bayes' Theorem better if we think of the formula as a process, beginning on the right side of the equation. We simply combine the prior information we have on the probability of $A$ occuring with the likelihood of $B$ given that $A$ actually occurs to produce our posterior, or adapted, probability of $A$. The crucial point here is that we can produce a "better", i.e. posterior, estimate of $A$'s probability when we take our previous knowledge of the context into account. This context knowledge is our prior knowledge on the probability of $A$.

Bayes' Theorem is often explained in the way we did above, with $A$ and $B$ standing for specific events. The example may become less abstract however, if we take into account that $A$ and $B$ can also represent distributions of a variable. Let us now assume that $P(A)$ follows in fact a distribution. This distribution is characterized by a set of parameters which we denote by $\theta$. For example, if our distribution follows a gaussian normal distribution, it can be described by its mean $\mu$ and variance $\sigma^2$. The $\theta$ parameters is what we actually want to estimate. Now, let us also assume that, instead of $B$, we have collected actual data through which we want to estimate $\theta$. We store our observed data in a vector $\bf{Y}$; of course, our observed data also follows a distribution, $P(\bf{Y})$. The formula now looks like this:

$$P(\theta | {\bf{Y}} ) \propto P( {\bf{Y}} | \theta )\times P( \theta )$$

In this equation, we now have $P(\theta)$, the prior distribution of $\theta$, which we simply define a priori, either based on our previous knowledge, or even our intuition on what the $\theta$ values may look like. Together with the likelihood distribution $P({\bf{Y}}|\theta)$ of our data given the true parameters $\theta$, we can estimate the posterior distribution $P(\theta|{\bf{Y}})$, which represents what we think the $\theta$ parameters look like if we take both the observed data and our prior knowledge into account. As can be seen in the formula, the posterior is still a distribution, not an estimated "true" value, meaning that even the results of bayesian inference are still probabilistic, and subjective in the sense that they represent our belief in the actual parameter values. Bayesian inference thus also means that we do not calculate confidence intervals around our estimates, but credibility intervals (CrI).

Here is a visualisation of the three distributions we described above, and how they might look like in a specific example:

A big asset of bayesian approaches is that the distributions do not have to follow a bell curve distribution like the ones in our visualization; any other kind of (more complex) distribution can be modeled. A disadvantage of bayesian inference, however, is that generating the (joint) distribution parameters from our collected data can be very computationally intensive; special Markov Chain Monte Carlo simulation procedures, such as the Gibbs sampling algorithm have thus been developed to generate posterior distributions. Markov Chain Monte Carlo is also used in the gemtc package to build our bayesian network meta-analysis model [@van2012automating].