1-getting-started.Rmd

---
title: "ETC3550: Applied forecasting for business and economics"
author: "Ch1. Getting started"
date: "OTexts.org/fpp2/"
fontsize: 14pt
output:
  beamer_presentation:
    fig_width: 7
    fig_height: 3.5
    highlight: tango
    theme: metropolis
    includes:
      in_header: header.tex
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache=TRUE)
library(fpp2)
```

# What can we forecast?

## Forecasting is difficult

\fullheight{hopecasts2}

## Forecasting is difficult

\fullwidth{bad_forecasts}

## What can we forecast?

\fullwidth{nasdaq-stock-market}

## What can we forecast?

\fullwidth{Forex2}

## What can we forecast?

\fullwidth{pills}

## What can we forecast?

\fullwidth{elecwires2}

## What can we forecast?

\fullheight{AusBOM}


## What can we forecast?

\fullwidth{ts22015}

## What can we forecast?

\fullheight{comet}


## Which is easiest to forecast?

 1. daily electricity demand in 3 days time
 2. timing of next Halley's comet appearance
 3. time of sunrise this day next year
 4. Google stock price tomorrow
 5. Google stock price in 6 months time
 6. maximum temperature tomorrow
 7. exchange rate of \$US/AUS next week
 8. total sales of drugs in Australian pharmacies next month

\pause

 - how do we measure "easiest"?
 - what makes something easy/difficult to forecast?

## Factors affecting forecastability

Something is easier to forecast if:

 - we have a good understanding of the factors that contribute to it
 - there is lots of data available;
 - the forecasts cannot affect the thing we are trying to forecast.
 - there is relatively low natural/unexplainable random variation.
 - the future is somewhat similar to the past

## Improving forecasts

\fullheight{ncep-skill}

# Time series data


## Time series data

  - Daily IBM stock prices
  - Monthly rainfall
  - Annual Google profits
  - Quarterly Australian beer production

\pause


```{r, echo=FALSE, fig.height=2}
ausbeer %>% autoplot
```

\pause

**Forecasting is estimating how the sequence of observations will continue into the future.**

## Australian beer production

```{r, echo=FALSE}
ausbeer %>% forecast %>% autoplot
```


## Australian beer production

```{r, echo=FALSE}
ausbeer %>% forecast %>% autoplot(include=60)
```

## Assignment 1: forecast the following series
\small

  1. Google closing stock price on 12 March 2018.
  2. Google closing stock price on 9 April 2018.
  3. The difference in points (Collingwood-Essendon) scored in the AFL match between Collingwood and Essendon for the Anzac Day clash. 25 April 2018.
  4. Maximum temperature at Melbourne airport on 7 May 2018.
  5. The trend estimate of total employment for April 2018. ABS CAT 6202, to be released around mid May 2018.

\begin{block}{}
For each of these, give a point forecast and an 80\% prediction interval.
\end{block}\pause
\begin{alertblock}{}
Prize: \$50 Amazon gift voucher
\end{alertblock}

## Assignment 1: scoring
\small

$Y=$ actual, $F=$ point forecast, $[L,U]=$ prediction interval

### Point forecasts:

$$\text{Absolute Error} = |Y-F|
$$

 * Rank results for all students in class
 * Add ranks across all five items

### Prediction intervals:

$$
\text{Interval Score} = (U - L) + 10(L - Y)_+ + 10 (Y-U)_+
$$

 * Rank results for all students
 * Add ranks across all five items


# Some case studies

## CASE STUDY 1: Paperware company

\fontsize{12}{14}\sf

\begin{textblock}{7.6}(0.2,1.4)
\textbf{Problem:} Want forecasts of each of hundreds of
items. Series can be stationary, trended or seasonal. They currently
have a large forecasting program written in-house but it doesn't seem
to produce sensible forecasts. They want me to tell them what is
wrong and fix it.

\vspace*{0.1cm}

\textbf{Additional information}\vspace*{-0.2cm}\fontsize{12}{13.5}\sf
\begin{itemize}\itemsep=0cm\parskip=0cm
\item  Program  written in COBOL making numerical calculations limited. It is not possible to do any optimisation.
\item Their programmer has little experience in numerical computing.
\item They employ no statisticians and want the program to produce forecasts \rlap{automatically.}
\end{itemize}
\end{textblock}

\placefig{8}{1.4}{width=4.8cm}{tableware2}


## CASE STUDY 1: Paperware company

### Methods currently used

A
: 12 month average

C
: 6 month average

E
: straight line regression over last 12 months

G
: straight line regression over last 6 months

H
: average slope between last year's and this year's values.
  (Equivalent to differencing at lag 12 and taking mean.)

I
: Same as H except over 6 months.

K
: I couldn't understand the explanation.

## CASE STUDY 2: PBS

\fullwidth{pills}


## CASE STUDY 2: PBS

### The Pharmaceutical Benefits Scheme (PBS) is the Australian government drugs subsidy scheme.

  * Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs.
  * The cost to government is determined by the number and types of drugs purchased. Currently nearly 1\% of GDP.
  * The total cost is budgeted based on forecasts of drug usage.

## CASE STUDY 2: PBS

\fullheight{pbs2}

## CASE STUDY 2: PBS

  * In 2001: \$4.5 billion budget, under-forecasted by \$800 million.
  * Thousands of products. Seasonal demand.
  * Subject to covert marketing, volatile products, uncontrollable expenditure.
  * Although monthly data available for 10 years, data are aggregated to annual values, and only the first three years are used in estimating the forecasts.
  * All forecasts being done with the \texttt{FORECAST} function in MS-Excel!


## CASE STUDY 3: Car fleet company

**Client:** One of Australia's largest car fleet companies

**Problem:** how to forecast resale value of vehicles? How
should this affect leasing and sales policies?

\pause

### Additional information
 - They can provide a large amount of data on previous vehicles and their eventual resale values.
 - The resale values are currently estimated by a group of specialists. They see me as a threat and do not cooperate.


## CASE STUDY 4: Airline

\fullwidth{ansettlogo}


## CASE STUDY 4: Airline

```{r, echo=FALSE, fig.height=5}
autoplot(melsyd[,"Economy.Class"],
  main="Economy class passengers: Melbourne-Sydney",
  xlab="Year",ylab="Thousands")
```


## CASE STUDY 4: Airline

```{r, echo=FALSE, fig.height=5}
autoplot(melsyd[,"Economy.Class"],
  main="Economy class passengers: Melbourne-Sydney",
  xlab="Year",ylab="Thousands")
```

\begin{textblock}{4.2}(7,6.3)
\begin{alertblock}{}
Not the real data! Or is it?
\end{alertblock}
\end{textblock}


## CASE STUDY 4: Airline

**Problem:** how to forecast passenger traffic on major routes?

### Additional information

  * They can provide a large amount of data on previous routes.
  * Traffic is affected by school holidays, special events such as
the Grand Prix, advertising campaigns, competition behaviour, etc.
  * They have a highly capable team of people who are able to do
most of the computing.


# The statistical forecasting perspective

## Sample futures

```{r austa1, echo=FALSE, message=FALSE, warning=FALSE, cache=TRUE, fig.width=9, fig.height=6}
fit <- ets(austa)
df <- cbind(austa, simulate(fit,10))
for(i in seq(9))
  df <- cbind(df, simulate(fit,10))
colnames(df) <- c("Data", paste("Future",1:10))
autoplot(df) +
  ylim(min(austa),10) +
  ylab("Millions of visitors") + xlab("Year") +
  ggtitle("Total international visitors to Australia") +
 scale_colour_manual(values=c('#000000',rainbow(10)),
                     breaks=c("Data",paste("Future",1:10)),
                     name=" ") +
  ylim(.85,10.0)
```

## Forecast intervals

```{r austa2, echo=FALSE, message=FALSE, warning=FALSE, cache=TRUE, fig.width=8.6, fig.height=6}
autoplot(forecast(fit)) +
  ylab("Millions of visitors") + xlab("Year") +
  ggtitle("Forecasts of total international visitors to Australia") +
  ylim(0.85,10.0)
```


## Statistical forecasting

\fontsize{14}{16}\sf

- Thing to be forecast: a random variable, $y_t$.
- Forecast distribution: If ${\cal I}$ is all observations, then $y_{t} |{\cal I}$ means ``the random variable $y_{t}$ given what we know in \rlap{${\cal I}$''.}
- The ``point forecast'' is the mean (or median) of $y_{t} |{\cal I}$
- The ``forecast variance'' is $\text{var}[y_{t} |{\cal I}]$
- A prediction interval or ``interval forecast'' is a range of values of $y_t$ with high \rlap{probability.}
- With time series, \rlap{${y}_{t|t-1} = y_t | \{y_1,y_2,\dots,y_{t-1}\}$. }
- $\hat{y}_{T+h|T} =\text{E}[y_{T+h} | y_1,\dots,y_T]$ (an $h$-step forecast taking account of all observations up to time $T$).