forked from EcoForecast/EF_Activities
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Stanimirova_Exercise_02_Logistic.Rmd
452 lines (320 loc) · 17.2 KB
/
Stanimirova_Exercise_02_Logistic.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
Exercise 2: From Models to Forecasts
========================================================
Note: This activity supplements material in Ecological Forecasting Chapter 2 "From Models to Forecasts" As a class activity, please turn in answers in Rmd format.
Simulating the discrete-time logistic
-------------------------------------
Define parameters
```{r}
r = 1 ## intrinsic growth rate
K = 10 ## carrying capacity
n0 = .1 ## initial population size
NT = 30 ## number of time steps to simulate
time = 1:NT
```
Iterative simulation
```{r}
n = rep(n0,NT) ## vector to store results
for(t in 2:NT){
n[t] = n[t-1] + r*n[t-1]*(1-n[t-1]/K)
}
```
Plot results
```{r}
plot(time,n,ylim=c(0,12),lwd=3,type='l',
,bty='l',cex.lab=1.5,
xlab="Time",ylab="Population Size")
```
### Problems
**1. Generate plots of the logistic growth model at r = 1.95, 2.05, 2.5, and
2.8 Describe the trajectory observed in each case.**
The higher the intrisic growth rate, the more chaotic the system gets and the more oscillations in the populations size we have. The graphs demonstrate that at an
intrinsic growth rate of 2.8 we have the highest oscillations compared to the lower values and we are reaching a state of deterministic chaos. This makes it harder to predict population dynamics in the future. At low r values we can be more confident saying that population size we will be approximately 10 even 30 units of time later.
```{r}
r <- matrix(c(1.95, 2.05, 2.5, 2.8)) ## intrinsic growth rate
K = 10 ## carrying capacity
n0 = .1 ## initial population size
NT = 30 ## number of time steps to simulate
time = 1:NT
n=matrix(rep(n0), nrow=length(r), ncol=NT)
for(k in 1:length(r)){
for(t in 2:NT){
n[k,t] = n[k,t-1] + r[k]*n[k,t-1]*(1-n[k,t-1]/K)
}
}
for(i in 1:dim(n)[1]){
plot(time,n[i,],ylim=c(0,12),lwd=3,type='l',
,bty='l',cex.lab=1.5,
xlab="Time",ylab="Population Size")
}
```
Probability distributions in R
------------------------------
Because it is a statistical language, there are a large number of probability distributions in R by default and an even larger number that can be loaded from packages. The table below gives a listing of the most common distributions in R, the name of the function within R, and the parameters of the distribution.
Distribution | R name | Parameters
------------ | ------ | ----------
beta | beta | shape1, shape2, ncp
Binomial | binom | size, prob
Cauchy | cauchy | location, scale
chi-squared | chisq | df, ncp
exponential | exp | rate
F | f | df1, df2, ncp
gamma | gamma | shape, scale
geometric | geom | prob
hypergeometric | hyper | m, n, k
log-normal | lnorm | meanlog, sdlog
logistic | logis | location, scale
Negative binomial | nbinom | size, prob
Normal | norm | mean, sd
Poisson | pois | lambda
Student's t | t | df, ncp
uniform | unif | min, max
Weibull | weibull | shape, scale
Wilcoxon | wilcox | m, n
There is a good chart at http://www.johndcook.com/distribution_chart.html that describes the relationships among the common distributions, and the Wikipedia articles for most of them are good for quick reference.
R actually provides four related functions for each probability distribution. These functions are called by adding a letter at the beginning of the function name. The variants of each probability distribution are:
* “d” = density: probability density function (PDF)
* “p” = cumulative distribution function (CDF)
* “q” = quantile: calculates the value associated with a specified tail probability, inverse of “p”
* “r” = random: simulates random numbers
The first argument to these functions is the same regardless of the distribution and is x for “d”, q for “p”, p for “q”and n for “r”
All of this will make more sense once we consider a concrete example. Let's take a look at the normal probability density function first, since it's the one you're most familiar with. If you use ?dnorm you'll see that for many of the function arguments there are default values, specifically mean=0 and sd=1. Therefore if these values are not specified explicitly in the function call R assumes you want a standard Normal distribution.
```{r}
x = seq(-5,5,by=0.1)
plot(x,dnorm(x),type='l') ## that’s a lowercase “L” for “line”
abline(v=0) ## add a line to indicate the mean (“v” is for “vertical”)
lines(x,dnorm(x,2),col=2) ## try changing the mean (“col” sets the color)
abline(v=2,col=2)
lines(x,dnorm(x,-1,2),col=3) ## try changing the mean and standard dev
abline(v=-1,col=3)
```
This plot of the normal distribution and the effects of varying the parameters in the normal are both probably familiar to you already – changing the mean changes where the distribution is centered while changing the standard deviation changes the spread of the distribution. Next try looking at the CDF of the normal:
```{r}
plot(x,pnorm(x,0,1),type='l')
abline(v=0)
lines(x,pnorm(x,2,1),col=2)
abline(v=2,col=2)
lines(x,pnorm(x,-1,2),col=3)
abline(v=-1,col=3)
```
Next let's look at the function qnorm. Since the input to this function is a quantile, the x-values for the plot are restricted to the range [0,1].
```{r}
p = seq(0,1,by=0.01)
plot(p,qnorm(p,0,1),type='l',ylim=range(x)) # ylim sets the y-axis range
# range returns the min/max as a 2-element vector
abline(h=0) # “h” for “horizontal”
lines(p,qnorm(p,2,1),col=2)
abline(h=2,col=2)
lines(p,qnorm(p,-1,2),col=3)
abline(h=-1,col=3)
```
As you can see, the quantile function is the inverse of the CDF. This function can be used to find the median of the distribution (p = 0.5) or to estimate confidence intervals at any level desired.
```{r}
qnorm(c(0.025,0.975),0,1) # what width CI is specified by these values?
plot(p,qnorm(p,0,1),type='l',ylim=range(x))
abline(v=c(0.025,0.975),lty=2) # add vertical lines at the CI
abline(h=qnorm(c(0.025,0.975)),lty=2) #add horizontal lines at the threshold vals
plot(x,dnorm(x,0,1),type='l') # plot the corresponding pdf
abline(v=qnorm(c(0.025,0.975)),lty=2)
```
Finally, let's investigate the rnorm function for generating random numbers that have a normal distribution. Here we generate histograms that have a progressively larger sample size and compare that to the actual density of the standard normal.
```{r}
n = c(10,100,1000,10000) # sequence of sample sizes
for(i in 1:4){ # loop over these sample sizes
hist(rnorm(n[i]),main=n[i],probability=TRUE,breaks=40)
#here breaks defines number of bins in the histogram
lines(x,dnorm(x),col=2)
}
```
One other technical note: like any function in R that generates random output, this example will give different results every time you run it.
This example demonstrates that as the number of random draws from a probability distribution increases, the histogram of those draws provides a better and better approximation of the density itself. We will make use of this fact extensively this semester because – as odd as this may sound now – there are many distributions that are easier to randomly sample from than solve for analytically.
### Problems
**2. Choose another probability distribution and generate graphs of the probability density function, the cumulative distribution function, the quantile function, and a histogram of samples from that distribution.**
```{r}
x = seq(0,100,by=1)
#d -- probability density function (PDF)
plot(x,dpois(x,1),type='l')
lines(x,dpois(x,4),col=2)
lines(x,dpois(x,10),col=3)
#p -- cumulative distribution function (CDF)
plot(x,ppois(x,1),type='l')
lines(x,ppois(x,4),col=2)
lines(x,ppois(x,10),col=3)
#q -- calculates the value associated with a specified tail probability, inverse of “p”
p = seq(0,1,by=0.01)
plot(p,qpois(p,1),type='l', ylim=range(x)) # ylim sets the y-axis range
lines(p,qpois(p,4),type='l', col=2, ylim=range(x)) # ylim sets the y-axis range
lines(p,qpois(p,10),type='l', col=3, ylim=range(x)) # ylim sets the y-axis range
qpois(c(0.025,0.975),10) # what width CI is specified by these values?
plot(x,dpois(x,10),type='l')
abline(v=qpois(c(0.025,0.975),10),lty=2)
#r -- simulates random numbers
n = c(10,100,1000,10000) # sequence of sample sizes
for(i in 1:4){ # loop over these sample sizes
hist(rpois(n[i],10),main=n[i],probability=TRUE, breaks=30)
#here breaks defines number of bins in the histogram
lines(x,dpois(x,10),col=2, type="l")
}
```
Monte Carlo Simulation
----------------------
Most of the figures from the logistic growth example in Chapter 2 included plots of the median trajectory and 95% interval estimates. These summary statistics are a reflection of the fact that the underlying model prediction was a timeseries of probability distributions, rather than a single trajectory. So how do we project a probability distribution through time?
In Chapter 11 we will explore a variety of analytical and numerical approaches to propagating uncertainty, and the trade-offs among them, in much greater detail. Today I wanted to introduce a particularly common and general numerical approach, **Monte Carlo** simulation. A Monte Carlo method is any algorithm that relies on randomization to approximate a computation. These approaches and other will be discussed in more detail later (e.g. Chapters 5, 11, 13, & 14).
The previous example of approximating the Normal distribution with a histogram of samples from the Normal distribution is a simple illustration of this approach. What makes this approach powerful is that we can transform the samples from a distribution through whatever function we would like and the resulting set of samples is the correct histogram for that transformation. This is important because, by contrast, we **cannot** transform the summary statistics, such as the mean of the probability distribution, through an arbitrary function because of **Jensen's Inequality**.
To illustrate this point, consider the function x^2 and a standard Normal distribution (mean = 0, sd = 1). Now if we ignore Jensen's Inequality and tranform the mean and 95% CI we'd end up with a probability distribution that has a mean of 0^2 = 0, a lower confidence interval of -1.96^2 = 3.84 and an upper confidence interval of 1.96^2 = 3.84. Clearly this can't be correct, since the upper and lower CI are identical and the lower CI is higher than the mean. By contrast, if we do this transformation numerically
```{r}
x = rnorm(10000,0,1)
y = x^2
hist(x,main="Original distribution",breaks=40)
abline(v=quantile(x,c(0.025,0.5,0.975)),lty=c(2,1,2),lwd=3,col="orange")
abline(v=mean(x),col="red",lwd=3,lty=3)
hist(y,main="Transformed distribution",breaks=40)
abline(v=quantile(y,c(0.025,0.5,0.975)),lty=c(2,1,2),lwd=3,col="orange")
abline(v=mean(y),col="red",lwd=3,lty=3)
```
The Monte Carlo estimate is that the mean is `r mean(y)`, the median is `r median(y)`, and the 95% CI is `r quantile(y,c(0.025,0.975))`.
It turns out that this specific transformation (x^2 of a standard Normal), has a well known analytical solution -- a Chi-squared distribution with one degee of freedom, so in this case we can compare the numerical approximation with the exact solution. This Chi-squared has a mean of 1, a median of `r qchisq(0.5,1)` and a 95% CI of `r qchisq(c(0.025,0.975),1)`.
```{r}
z = rchisq(x,1,0)
hist(z,main="Ch-squared distribution",breaks=40, xlim=c(0,15))
abline(v=qchisq(c(0.025,0.5,0.975),1),lty=c(2,1,2),lwd=3,col="orange")
abline(v=mean(z),col="red",lwd=3,lty=3)
```
### Problems
**3. Numerically transform a lognormal(meanlog=0,sdlog=0.5) through sin(x) using Monte Carlo simulation. Include histograms of the original and transformed distributions. Report the mean, median, and 95% CI for both distributions and indicate these values on the histograms.**
```{r}
x = rlnorm(10000,0,0.5)
y = sin(x)
hist(x,main="Original distribution",breaks=40)
abline(v=qlnorm(c(0.025,0.5,0.975),0,0.5),lty=c(2,1,2),lwd=3,col="orange")
abline(v=mean(x),col="red",lwd=3,lty=3)
hist(y,main="Transformed distribution",breaks=40)
abline(v=quantile(y,c(0.025,0.5,0.975)),lty=c(2,1,2),lwd=3,col="orange")
abline(v=mean(y),col="red",lwd=3,lty=3)
mean(y)
median(y)
quantile(y,c(0.025,0.975))
```
Parameter error
---------------
We next want to use the Monte Carlo approach to account for parameter uncertainty in the logistic growth model
To begin, we need to specify the uncertainty in the model parameters and the size of the simulation
```{r}
r <- 1
r.sd = 0.2 ## standard deviation on r
K.sd = 1.0 ## standard deviation on K
NE = 1000 ## Ensemble size
```
Next, we need to run the Monte Carlo simulation for the logistic. In this case we'll be running the logistic growth model 1000 times, each time with slightly different parameters. We'll then store all 1000 trajectories in a matrix. In effect, we'll be estimating our probability distributions and all of our summary statistics from a sample of lines, rather than a sample of points as we did in the previous example.
```{r}
n = matrix(n0,NE,NT) # storage for all simulations
rE = rnorm(NE,r,r.sd) # sample of r
KE = rnorm(NE,K,K.sd) # sample of K
for(i in 1:NE){ # loop over samples
for(t in 2:NT){ # for each sample, simulate throught time
n[i,t] = n[i,t-1] + rE[i]*n[i,t-1]*(1-n[i,t-1]/KE[i])
}
}
```
Next we'll use *apply* to calculate the median and CI for each time point.
```{r}
n.stats = apply (n,2,quantile,c(0.025,0.5,0.975))
```
Unfortunately, R doesn't have a built in function for plotting shaded CI on time-series plots, so we'll define one. Unlike plot, which just takes x and y as arguements, this function need to take time-series for both the upper (yhi) and lower (ylo) intervals.
```{r}
ciEnvelope <- function(x,ylo,yhi,col="lightgrey",...){
polygon(cbind(c(x, rev(x), x[1]), c(ylo, rev(yhi),
ylo[1])), border = NA,col=col,...)
}
```
### Problems
**4. Plot histograms of the samples of r and K used for the simulation.**
```{r}
hist(rE,probability=TRUE,breaks=40, main="Histogram of Growth Rate")
hist(KE,probability=TRUE,breaks=40, main="Histogram of Carrying Capacity")
```
**5. Plot a sample of 10 different trajectories from your ensemble (on one graph).**
```{r}
n_sample <- n[sample(nrow(n), size=10, replace=FALSE),]
col.rainbow <- rainbow(10)
palette(col.rainbow)
par(mar=c(5,5,2,5))
matplot(t(n_sample), ylim=c(0,15),lwd=2, type='l',
,bty='l',cex.lab=1.5, col=col.rainbow,
xlab="Time",ylab="Population Size")
par(fig=c(0,1,0,1), oma=c(0, 0, 0, 0), mar=c(0,0,0,0), pty="s", new=TRUE)
plot(0,0, type="n", bty="n", xaxt="n", yaxt="n")
legend('right', xpd=TRUE, inset=c(0,0), legend=1:10, lwd=2, col=col.rainbow)
dev.off()
```
**6. Plot a histogram of your population forecast at time = 15.**
```{r}
n_15 <- n[,15]
hist(n_15,main="Population at time 15", breaks=30, xlim=range(n[,15]))
```
**7. Plot the median trajectory and 95% CI.**
**8. Add a 50% CI (i.e. 25% to 75%) to the plot. Note that you'll have to both compute the summary statistics for this interval and plot the envelope in a different color.**
```{r}
plot(time,n.stats[2,], ylim=c(0,15),lwd=3,type='l',
,bty='l',cex.lab=1.5,
xlab="Time",ylab="Population Size")
ciEnvelope(time,n.stats[1,], n.stats[3,])
n.stats.50 = apply(n,2,quantile,c(0.25,0.5,0.75))
ciEnvelope(time,n.stats.50[1,], n.stats.50[3,], "lightpink")
```
Extra Credit: Initial conditions
--------------------------------
The approach for simulating uncertainty in the initial conditions is very similar to the approach used for the parameter uncertainty. As in Chapter 2, we'll assume that the initial condition is distributed as a lognormal to ensure that we never draw negative values. For this example we'll assume a standard deviation of 0.6 and an intrinsic growth rate of 0.3
```{r}
r = 0.3
n0.sd = 0.6
n0s = rlnorm(NE,log(n0),n0.sd)
n = matrix(n0s,NE,NT)
for(i in 1:NE){
for(t in 2:NT){
n[i,t] = n[i,t-1] + r*n[i,t-1]*(1-n[i,t-1]/K)
}
}
```
### Problems
**9. Plot the median & 95% interval.**
```{r}
n.stats.new = apply (n,2,quantile,c(0.025,0.5,0.975))
plot(time,n.stats.new[2,], ylim=c(0,15),lwd=3,type='l',
,bty='l',cex.lab=1.5,
xlab="Time",ylab="Population Size")
ciEnvelope(time,n.stats.new[1,], n.stats.new[3,])
```
**10. Repeat with r equal to 1.95, 2.05, and 2.8**
```{r}
n = matrix(n0,NE,NT) # storage for all simulations
r = as.matrix(c(1.95, 2.05, 2.8))
n0.sd = 0.6
n0s = rlnorm(NE,log(n0),n0.sd)
n = matrix(n0s,NE,NT)
my_func <- function(x) {
for(i in 1:NE){
for(t in 2:NT){
n[i,t] = n[i,t-1] + x*n[i,t-1]*(1-n[i,t-1]/K)
}
}
return(n)
}
n1 <- my_func(r[1])
n.stats.1 = apply (n1,2,quantile,c(0.025,0.5,0.975))
plot(time,n.stats.1[2,], ylim=c(0,15),lwd=3,type='l',
,bty='l',cex.lab=1.5,
xlab="Time",ylab="Population Size")
ciEnvelope(time,n.stats.1[1,], n.stats.1[3,])
n2 <- my_func(r[2])
n.stats.2 = apply (n2,2,quantile,c(0.025,0.5,0.975))
plot(time,n.stats.2[2,], ylim=c(0,15),lwd=3,type='l',
,bty='l',cex.lab=1.5,
xlab="Time",ylab="Population Size")
ciEnvelope(time,n.stats.2[1,], n.stats.2[3,])
n3 <- my_func(r[3])
n.stats.3 = apply (n3,2,quantile,c(0.025,0.5,0.975))
plot(time,n.stats.3[2,], ylim=c(0,15),lwd=3,type='l',
,bty='l',cex.lab=1.5,
xlab="Time",ylab="Population Size")
ciEnvelope(time,n.stats.3[1,], n.stats.3[3,])
```