forked from robjhyndman/ETC3550Slides
-
Notifications
You must be signed in to change notification settings - Fork 0
/
1-getting-started.Rmd
342 lines (223 loc) · 8.36 KB
/
1-getting-started.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
---
title: "ETC3550: Applied forecasting for business and economics"
author: "Ch1. Getting started"
date: "OTexts.org/fpp2/"
fontsize: 14pt
output:
beamer_presentation:
fig_width: 7
fig_height: 3.5
highlight: tango
theme: metropolis
includes:
in_header: header.tex
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache=TRUE)
library(fpp2)
```
# What can we forecast?
## Forecasting is difficult
\fullheight{hopecasts2}
## Forecasting is difficult
\fullwidth{bad_forecasts}
## What can we forecast?
\fullwidth{nasdaq-stock-market}
## What can we forecast?
\fullwidth{Forex2}
## What can we forecast?
\fullwidth{pills}
## What can we forecast?
\fullwidth{elecwires2}
## What can we forecast?
\fullheight{AusBOM}
## What can we forecast?
\fullwidth{ts22015}
## What can we forecast?
\fullheight{comet}
## Which is easiest to forecast?
1. daily electricity demand in 3 days time
2. timing of next Halley's comet appearance
3. time of sunrise this day next year
4. Google stock price tomorrow
5. Google stock price in 6 months time
6. maximum temperature tomorrow
7. exchange rate of \$US/AUS next week
8. total sales of drugs in Australian pharmacies next month
\pause
- how do we measure "easiest"?
- what makes something easy/difficult to forecast?
## Factors affecting forecastability
Something is easier to forecast if:
- we have a good understanding of the factors that contribute to it
- there is lots of data available;
- the forecasts cannot affect the thing we are trying to forecast.
- there is relatively low natural/unexplainable random variation.
- the future is somewhat similar to the past
## Improving forecasts
\fullheight{ncep-skill}
# Time series data
## Time series data
- Daily IBM stock prices
- Monthly rainfall
- Annual Google profits
- Quarterly Australian beer production
\pause
```{r, echo=FALSE, fig.height=2}
ausbeer %>% autoplot
```
\pause
**Forecasting is estimating how the sequence of observations will continue into the future.**
## Australian beer production
```{r, echo=FALSE}
ausbeer %>% forecast %>% autoplot
```
## Australian beer production
```{r, echo=FALSE}
ausbeer %>% forecast %>% autoplot(include=60)
```
## Assignment 1: forecast the following series
\small
1. Google closing stock price on 12 March 2018.
2. Google closing stock price on 9 April 2018.
3. The difference in points (Collingwood-Essendon) scored in the AFL match between Collingwood and Essendon for the Anzac Day clash. 25 April 2018.
4. Maximum temperature at Melbourne airport on 7 May 2018.
5. The trend estimate of total employment for April 2018. ABS CAT 6202, to be released around mid May 2018.
\begin{block}{}
For each of these, give a point forecast and an 80\% prediction interval.
\end{block}\pause
\begin{alertblock}{}
Prize: \$50 Amazon gift voucher
\end{alertblock}
## Assignment 1: scoring
\small
$Y=$ actual, $F=$ point forecast, $[L,U]=$ prediction interval
### Point forecasts:
$$\text{Absolute Error} = |Y-F|
$$
* Rank results for all students in class
* Add ranks across all five items
### Prediction intervals:
$$
\text{Interval Score} = (U - L) + 10(L - Y)_+ + 10 (Y-U)_+
$$
* Rank results for all students
* Add ranks across all five items
# Some case studies
## CASE STUDY 1: Paperware company
\fontsize{12}{14}\sf
\begin{textblock}{7.6}(0.2,1.4)
\textbf{Problem:} Want forecasts of each of hundreds of
items. Series can be stationary, trended or seasonal. They currently
have a large forecasting program written in-house but it doesn't seem
to produce sensible forecasts. They want me to tell them what is
wrong and fix it.
\vspace*{0.1cm}
\textbf{Additional information}\vspace*{-0.2cm}\fontsize{12}{13.5}\sf
\begin{itemize}\itemsep=0cm\parskip=0cm
\item Program written in COBOL making numerical calculations limited. It is not possible to do any optimisation.
\item Their programmer has little experience in numerical computing.
\item They employ no statisticians and want the program to produce forecasts \rlap{automatically.}
\end{itemize}
\end{textblock}
\placefig{8}{1.4}{width=4.8cm}{tableware2}
## CASE STUDY 1: Paperware company
### Methods currently used
A
: 12 month average
C
: 6 month average
E
: straight line regression over last 12 months
G
: straight line regression over last 6 months
H
: average slope between last year's and this year's values.
(Equivalent to differencing at lag 12 and taking mean.)
I
: Same as H except over 6 months.
K
: I couldn't understand the explanation.
## CASE STUDY 2: PBS
\fullwidth{pills}
## CASE STUDY 2: PBS
### The Pharmaceutical Benefits Scheme (PBS) is the Australian government drugs subsidy scheme.
* Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs.
* The cost to government is determined by the number and types of drugs purchased. Currently nearly 1\% of GDP.
* The total cost is budgeted based on forecasts of drug usage.
## CASE STUDY 2: PBS
\fullheight{pbs2}
## CASE STUDY 2: PBS
* In 2001: \$4.5 billion budget, under-forecasted by \$800 million.
* Thousands of products. Seasonal demand.
* Subject to covert marketing, volatile products, uncontrollable expenditure.
* Although monthly data available for 10 years, data are aggregated to annual values, and only the first three years are used in estimating the forecasts.
* All forecasts being done with the \texttt{FORECAST} function in MS-Excel!
## CASE STUDY 3: Car fleet company
**Client:** One of Australia's largest car fleet companies
**Problem:** how to forecast resale value of vehicles? How
should this affect leasing and sales policies?
\pause
### Additional information
- They can provide a large amount of data on previous vehicles and their eventual resale values.
- The resale values are currently estimated by a group of specialists. They see me as a threat and do not cooperate.
## CASE STUDY 4: Airline
\fullwidth{ansettlogo}
## CASE STUDY 4: Airline
```{r, echo=FALSE, fig.height=5}
autoplot(melsyd[,"Economy.Class"],
main="Economy class passengers: Melbourne-Sydney",
xlab="Year",ylab="Thousands")
```
## CASE STUDY 4: Airline
```{r, echo=FALSE, fig.height=5}
autoplot(melsyd[,"Economy.Class"],
main="Economy class passengers: Melbourne-Sydney",
xlab="Year",ylab="Thousands")
```
\begin{textblock}{4.2}(7,6.3)
\begin{alertblock}{}
Not the real data! Or is it?
\end{alertblock}
\end{textblock}
## CASE STUDY 4: Airline
**Problem:** how to forecast passenger traffic on major routes?
### Additional information
* They can provide a large amount of data on previous routes.
* Traffic is affected by school holidays, special events such as
the Grand Prix, advertising campaigns, competition behaviour, etc.
* They have a highly capable team of people who are able to do
most of the computing.
# The statistical forecasting perspective
## Sample futures
```{r austa1, echo=FALSE, message=FALSE, warning=FALSE, cache=TRUE, fig.width=9, fig.height=6}
fit <- ets(austa)
df <- cbind(austa, simulate(fit,10))
for(i in seq(9))
df <- cbind(df, simulate(fit,10))
colnames(df) <- c("Data", paste("Future",1:10))
autoplot(df) +
ylim(min(austa),10) +
ylab("Millions of visitors") + xlab("Year") +
ggtitle("Total international visitors to Australia") +
scale_colour_manual(values=c('#000000',rainbow(10)),
breaks=c("Data",paste("Future",1:10)),
name=" ") +
ylim(.85,10.0)
```
## Forecast intervals
```{r austa2, echo=FALSE, message=FALSE, warning=FALSE, cache=TRUE, fig.width=8.6, fig.height=6}
autoplot(forecast(fit)) +
ylab("Millions of visitors") + xlab("Year") +
ggtitle("Forecasts of total international visitors to Australia") +
ylim(0.85,10.0)
```
## Statistical forecasting
\fontsize{14}{16}\sf
- Thing to be forecast: a random variable, $y_t$.
- Forecast distribution: If ${\cal I}$ is all observations, then $y_{t} |{\cal I}$ means ``the random variable $y_{t}$ given what we know in \rlap{${\cal I}$''.}
- The ``point forecast'' is the mean (or median) of $y_{t} |{\cal I}$
- The ``forecast variance'' is $\text{var}[y_{t} |{\cal I}]$
- A prediction interval or ``interval forecast'' is a range of values of $y_t$ with high \rlap{probability.}
- With time series, \rlap{${y}_{t|t-1} = y_t | \{y_1,y_2,\dots,y_{t-1}\}$. }
- $\hat{y}_{T+h|T} =\text{E}[y_{T+h} | y_1,\dots,y_T]$ (an $h$-step forecast taking account of all observations up to time $T$).