forked from EcoForecast/EF_Activities
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Exercise_02_Logistic.Rmd
249 lines (184 loc) · 12.2 KB
/
Exercise_02_Logistic.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
Exercise 2: From Models to Forecasts
========================================================
Note: This activity supplements material in Ecological Forecasting Chapter 2 "From Models to Forecasts" As a class activity, please turn in answers in Rmd format.
Simulating the discrete-time logistic
-------------------------------------
Define parameters
```{r}
r = 1 ## intrinsic growth rate
K = 10 ## carrying capacity
n0 = .1 ## initial population size
NT = 30 ## number of time steps to simulate
time = 1:NT
```
Iterative simulation
```{r}
n = rep(n0,NT) ## vector to store results
for(t in 2:NT){
n[t] = n[t-1] + r*n[t-1]*(1-n[t-1]/K)
}
```
Plot results
```{r}
plot(time,n,ylim=c(0,12),lwd=3,type='l',
,bty='l',cex.lab=1.5,
xlab="Time",ylab="Population Size")
```
### Problems
1. Generate plots of the logistic growth model at r = 1.95, 2.05, 2.5, and 2.8 Describe the trajectory observed in each case.
Probability distributions in R
------------------------------
Because it is a statistical language, there are a large number of probability distributions in R by default and an even larger number that can be loaded from packages. The table below gives a listing of the most common distributions in R, the name of the function within R, and the parameters of the distribution.
Distribution | R name | Parameters
------------ | ------ | ----------
beta | beta | shape1, shape2, ncp
Binomial | binom | size, prob
Cauchy | cauchy | location, scale
chi-squared | chisq | df, ncp
exponential | exp | rate
F | f | df1, df2, ncp
gamma | gamma | shape, scale
geometric | geom | prob
hypergeometric | hyper | m, n, k
log-normal | lnorm | meanlog, sdlog
logistic | logis | location, scale
Negative binomial | nbinom | size, prob
Normal | norm | mean, sd
Poisson | pois | lambda
Student's t | t | df, ncp
uniform | unif | min, max
Weibull | weibull | shape, scale
Wilcoxon | wilcox | m, n
There is a good chart at http://www.johndcook.com/distribution_chart.html that describes the relationships among the common distributions, and the Wikipedia articles for most of them are good for quick reference.
R actually provides four related functions for each probability distribution. These functions are called by adding a letter at the beginning of the function name. The variants of each probability distribution are:
* “d” = density: probability density function (PDF)
* “p” = cumulative distribution function (CDF)
* “q” = quantile: calculates the value associated with a specified tail probability, inverse of “p”
* “r” = random: simulates random numbers
The first argument to these functions is the same regardless of the distribution and is x for “d”, q for “p”, p for “q”and n for “r”
All of this will make more sense once we consider a concrete example. Let's take a look at the normal probability density function first, since it's the one you're most familiar with. If you use ?dnorm you'll see that for many of the function arguments there are default values, specifically mean=0 and sd=1. Therefore if these values are not specified explicitly in the function call R assumes you want a standard Normal distribution.
```{r}
x = seq(-5,5,by=0.1)
plot(x,dnorm(x),type='l') ## that’s a lowercase “L” for “line”
abline(v=0) ## add a line to indicate the mean (“v” is for “vertical”)
lines(x,dnorm(x,2),col=2) ## try changing the mean (“col” sets the color)
abline(v=2,col=2)
lines(x,dnorm(x,-1,2),col=3) ## try changing the mean and standard dev
abline(v=-1,col=3)
```
This plot of the normal distribution and the effects of varying the parameters in the normal are both probably familiar to you already – changing the mean changes where the distribution is centered while changing the standard deviation changes the spread of the distribution. Next try looking at the CDF of the normal:
```{r}
plot(x,pnorm(x,0,1),type='l')
abline(v=0)
lines(x,pnorm(x,2,1),col=2)
abline(v=2,col=2)
lines(x,pnorm(x,-1,2),col=3)
abline(v=-1,col=3)
```
Next let's look at the function qnorm. Since the input to this function is a quantile, the x-values for the plot are restricted to the range [0,1].
```{r}
p = seq(0,1,by=0.01)
plot(p,qnorm(p,0,1),type='l',ylim=range(x)) # ylim sets the y-axis range
# range returns the min/max as a 2-element vector
abline(h=0) # “h” for “horizontal”
lines(p,qnorm(p,2,1),col=2)
abline(h=2,col=2)
lines(p,qnorm(p,-1,2),col=3)
abline(h=-1,col=3)
```
As you can see, the quantile function is the inverse of the CDF. This function can be used to find the median of the distribution (p = 0.5) or to estimate confidence intervals at any level desired.
```{r}
qnorm(c(0.025,0.975),0,1) # what width CI is specified by these values?
plot(p,qnorm(p,0,1),type='l',ylim=range(x))
abline(v=c(0.025,0.975),lty=2) # add vertical lines at the CI
abline(h=qnorm(c(0.025,0.975)),lty=2) #add horizontal lines at the threshold vals
plot(x,dnorm(x,0,1),type='l') # plot the corresponding pdf
abline(v=qnorm(c(0.025,0.975)),lty=2)
```
Finally, let's investigate the rnorm function for generating random numbers that have a normal distribution. Here we generate histograms that have a progressively larger sample size and compare that to the actual density of the standard normal.
```{r}
n = c(10,100,1000,10000) # sequence of sample sizes
for(i in 1:4){ # loop over these sample sizes
hist(rnorm(n[i]),main=n[i],probability=TRUE,breaks=40)
#here breaks defines number of bins in the histogram
lines(x,dnorm(x),col=2)
}
```
One other technical note: like any function in R that generates random output, this example will give different results every time you run it.
This example demonstrates that as the number of random draws from a probability distribution increases, the histogram of those draws provides a better and better approximation of the density itself. We will make use of this fact extensively this semester because – as odd as this may sound now – there are many distributions that are easier to randomly sample from than solve for analytically.
### Problems
2. Choose another probability distribution and generate graphs of the probability density function, the cumulative distribution function, the quantile function, and a histogram of samples from that distribution.
Monte Carlo Simulation
----------------------
Most of the figures from the logistic growth example in Chapter 2 included plots of the median trajectory and 95% interval estimates. These summary statistics are a reflection of the fact that the underlying model prediction was a timeseries of probability distributions, rather than a single trajectory. So how do we project a probability distribution through time?
In Chapter 11 we will explore a variety of analytical and numerical approachs to propagating uncertainty, and the trade-offs among them, in much greater detail. Today I wanted to introduce a particularly common and general numerical approach, **Monte Carlo** simulation. A Monte Carlo method is any algorithm that relys on randomization to approximate a computation. These approaches and other will be discussed in more detail later (e.g. Chapters 5, 11, 13, & 14).
The previous example of approximating the Normal distribution with a histogram of samples from the Normal distribution is a simple illustration of this approach. What makes this approach powerful is that we can transform the samples from a distribution through whatever function we would like and the resulting set of samples is the correct histogram for that transformation. This is important because, by contrast, we **cannot** transform the summary statistics, such as the mean of the probability distribution, through an arbitrary function because of **Jensen's Inequality**.
To illustrate this point, consider the function x^2 and a standard Normal distribution (mean = 0, sd = 1). Now if we ignore Jensen's Inequality and tranform the mean and 95% CI we'd end up with a probability distribution that has a mean of 0^2 = 0, a lower confidence interval of -1.96^2 = 3.84 and an upper confidence interval of 1.96^2 = 3.84. Clearly this can't be correct, since the upper and lower CI are identical and the lower CI is higher than the mean. By contrast, if we do this transformation numerically
```{r}
x = rnorm(10000,0,1)
y = x^2
hist(x,main="Original distribution",breaks=40)
abline(v=quantile(x,c(0.025,0.5,0.975)),lty=c(2,1,2),lwd=3,col="orange")
abline(v=mean(x),col="red",lwd=3,lty=3)
hist(y,main="Transformed distribution",breaks=40)
abline(v=quantile(y,c(0.025,0.5,0.975)),lty=c(2,1,2),lwd=3,col="orange")
abline(v=mean(y),col="red",lwd=3,lty=3)
```
The Monte Carlo estimate is that the mean is `r mean(y)`, the median is `r median(y)`, and the 95% CI is `r quantile(y,c(0.025,0.975))`.
It turns out that this specific transformation (x^2 of a standard Normal), has a well known analytical solution -- a Chi-squared distribution with one degee of freedom, so in this case we can compare the numerical approximation with the exact solution. This Chi-squared has a mean of 1, a median of `r qchisq(0.5,1)` and a 95% CI of `r qchisq(c(0.025,0.975),1)`.
### Problems
3. Numerically transform a lognormal(meanlog=0,sdlog=0.5) through sin(x) using Monte Carlo simulation. Include histograms of the original and transformed distributions. Report the mean, median, and 95% CI for both distributions and indicate these values on the histograms.
Parameter error
---------------
We next want to use the Monte Carlo approach to account for parameter uncertainty in the logistic growth model
To begin, we need to specify the uncertainty in the model parameters and the size of the simulation
```{r}
r.sd = 0.2 ## standard deviation on r
K.sd = 1.0 ## standard deviation on K
NE = 1000 ## Ensemble size
```
Next, we need to run the Monte Carlo simulation for the logistic. In this case we'll be running the logistic growth model 1000 times, each time with slightly different parameters. We'll then store all 1000 trajectories in a matrix. In effect, we'll be estimating our probability distributions and all of our summary statistics from a sample of lines, rather than a sample of points as we did in the previous example.
```{r}
n = matrix(n0,NE,NT) # storage for all simulations
rE = rnorm(NE,r,r.sd) # sample of r
KE = rnorm(NE,K,K.sd) # sample of K
for(i in 1:NE){ # loop over samples
for(t in 2:NT){ # for each sample, simulate throught time
n[i,t] = n[i,t-1] + rE[i]*n[i,t-1]*(1-n[i,t-1]/KE[i])
}
}
```
Next we'll use *apply* to calculate the median and CI for each time point.
```{r}
n.stats = apply(n,2,quantile,c(0.025,0.5,0.975))
```
Unfortunately, R doesn't have a built in function for plotting shaded CI on time-series plots, so we'll define one. Unlike plot, which just takes x and y as arguements, this function need to take time-series for both the upper (yhi) and lower (ylo) intervals.
```{r}
ciEnvelope <- function(x,ylo,yhi,col="lightgrey",...){
polygon(cbind(c(x, rev(x), x[1]), c(ylo, rev(yhi),
ylo[1])), border = NA,col=col,...)
}
```
### Problems
4. Plot histograms of the samples of r and K used for the simulation.
5. Plot a sample of 10 different trajectories from your ensemble (on one graph).
6. Plot a histogram of your population forecast at time = 15.
7. Plot the median trajectory and 95% CI.
8. Add a 50% CI (i.e. 25% to 75%) to the plot. Note that you'll have to both compute the summary statistics for this interval and plot the envelope in a different color.
Extra Credit: Initial conditions
--------------------------------
The approach for simulating uncertainty in the initial conditions is very similar to the approach used for the parameter uncertainty. As in Chapter 2, we'll assume that the initial condition is distributed as a lognormal to ensure that we never draw negative values. For this example we'll assume a standard deviation of 0.6 and an intrinsic growth rate of 0.3
```{r}
r = 0.3
n0.sd = 0.6
n0s = rlnorm(NE,log(n0),n0.sd)
n = matrix(n0s,NE,NT)
for(i in 1:NE){
for(t in 2:NT){
n[i,t] = n[i,t-1] + r*n[i,t-1]*(1-n[i,t-1]/K)
}
}
```
### Problems
9. Plot the median & 95% interval.
10. Repeat with r equal to 1.95, 2.05, and 2.8