-
Notifications
You must be signed in to change notification settings - Fork 109
/
Visualization_in_Time_Series_Analysis.Rmd
274 lines (223 loc) · 9.91 KB
/
Visualization_in_Time_Series_Analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# Visualization in Time Series Analysis
Yihao Li (yl4326)
```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Note: All the pseudo-documentation introduction will only include parameter I used often, it's not the full parameter set.
Basic Settings
```{r, results='hide', message=FALSE, warning=FALSE}
#In case there's some new package...
#install.packages(c("tidyverse", "stats", "forecast", "dynlm", "lubridate",
# "strucchange","sarima"))
library(tidyverse)
library(stats) # a lot of basic operations here
library(forecast) # fantastic package with a ton of shortcut
library(dynlm) # if you need some linear regression on lag of data
library(lubridate) # I didn’t include any function here but necessary for time series
library(strucchange) # parameter stability test
#library(zoo) #used by forecast, will be automatically included
require("PolynomF")
library(sarima) # not for the analysis, just for simulation in showing
```
## Initiate a Time series object:
**stats::ts(data = NA, start = 1, frequency = 1)**
where:
data is the data input
start refers to the time of the first observation.
Frequency the number of observations per unit of time.
## Plot the data:
**plot(time, value)**
Traditional built-in plot,
or,
**ggplot2::autoplot(object)**
powerful plotting function not only for time series analysis but also for model decomposition object, model fit object, even forecast models.
```{r}
library(expsmooth) # use a dataset cangas here
ts_data = cangas
autoplot(ts_data) + ylab("Canadaian Gas Production")
```
## Transformation of nonstationary:
### Stationarity:
Required by most of the time series analysis tools:
1. First Order Weakly Stationary: all R.V.s have the same means.
2. Second Order Weakly Stationary (Covariance Stationary)
a. Necessary for most of the time series analysis.
b. All R.V.s have same variables, same variances.
c. The correlation of time variable only depend on the time difference, i.e. $\rho(Y_t, Y_{t-k}) = \rho(|k|)$.
### Operations
From the plot we have a intuition of its stationarity and we can do following:
1. First Order Weakly Stationary:
Take the first difference $(Y_t – Y_{t-1})$ of the data.
2. second order weakly stationary:
Take the log and then take the first difference(i.e. $(\log(y_T) – \log(y_{T-1}))$of this transformed series.
3. Growth rate.
```{r}
ts_data_diff = diff(ts_data) #first-order difference
ts_data_log_diff = diff(log(ts_data)) #first-order log difference
ts_data_growth_rate = diff(ts_data)/ stats::lag(ts_data, k=-1) #growth rate
```
## ACF and PACF for time series
1. Autocovariance Function(ACF): $corr(y_t ,y_{t-k})$
2. Partial Autocorrleation Function(PACF): the autocorrelataion between $Y_t$ and $Y_{t+k}$ after removing(conditioning on) all the observation in between.
3. To save your code line, there’s
**forecast::tsdisplay(ts_data)**
which plots the **ggplot2::autoplot()**, **stats::acf()**, and **stats::pacf()** together.
It also have a very friendly
**forecast::ggtsdisplay(ts_data)**
which returns a ggplot object for you, however, **ggtitle()** behaves weird in this graph, not to use ggtitle with it.
4. White Noise:
Time series process with zero mean, constant variance, and no serial correlation.
Most of time use Gaussian y_t ~N(0, $\sigma$^2).
For pure white noise, both ACF and PACF should be 0, only k = 0 will have ACF = PACF = 1.
```{r}
stats::acf(ts_data_diff)
stats::pacf(ts_data_diff)
tsdisplay(ts_data_diff)
ggtsdisplay(ts_data_diff)
```
## Full model: Yt = T(Trend) + S(Seasonality) +C(Cycle)
Time is the money my friend, you don’t need to waste time to guess the seasonality pattern.
1. **stats::decompose(ts_data, type = c("additive", "multiplicative"))**
Decompose a time series into seasonal, trend and irregular components using moving average.
Where type stands for additive/multiplicative seasonal cpomponent.
2. **stats::stl(ts_data, s.window)**
Decompose a time series into seasonal, trend and irregular components using loess.
3. **forecast::seasadj(object)**
Takes in a decompose or stl object. Returns seasonally adjusted data. constructed by removing the seasonal component.
4. You can also find the decomposition data in model.
for **decomposed.ts** object, it’s in **x()**(original) **seasonal**, **trend**, **random**(remainder).
for stl object it’s in **time.series**(seasonal, trend, and remainder).
```{r}
fit <- decompose(ts_data, type='additive') # I use original data here!
autoplot(fit)
fit %>% seasadj() %>% autoplot() + ggtitle("Seasonally adjusted data")
fit$trend %>% autoplot() + ggtitle("Trend")
```
### Trend(T): Linear, Quadratic, etc. For normal linear model
**stats::lm(formula, data, na.action)** For normal linear regression;
**dynlm::dynlm(formula, data, na.action)** For dynamic linear regression;
na.action is optional but I think it saves life from cleaning data.
### Seasonality(S):
1. Direct visualization: decompose and stl.
2. one-line beautiful visualization: **forecast::seasonplot(ts_data)**
and it’s ggplot version **forecast::ggseasonplot(ts_data)**.
```{r}
ggseasonplot(ts_data)
```
### Cycle(C):
ARIMA family: often used, powerful model in cycle analysis. Notice the model selection is quite subjective except the ARIMA one.
#### 1. MA(q) Moving average:
$Y_t – μ = \sum_1^q \theta_i \epsilon_{t-i}$
ACF: significant spikes in first q position
PACF: decaying in absolute value
```{r}
ma2.sim<-arima.sim(model=list(ma=c(0.7,0.3)),n=100)
tsdisplay(ma2.sim)
```
#### 2. AR(p) Autoregressive:
$Y_t = c +\sum_1^p \phi_i Y_{t-i} +\epsilon_t$
ACF: decaying in absolute value
PACF: significant spikes in first q position.
```{r}
ar2.sim <- arima.sim(model=list(ar=c(0.5,0.1)), n=500)
tsdisplay(ar2.sim)
```
#### 3. Seasonal Model with parameter s:
s stands for the period, e.g. 4 for quarterly data, 12 for monthly data, etc.
##### a. Seasonal-MA(q) Model with parameter s:
$Y_t – μ = \sum_1^q\theta_{is} \epsilon_{t-is} + \epsilon_t$
ACF: spikes at s, 2s,… , p*s
PACF: spikes at n*s decaying
```{r}
s4ma1.sim <- sim_sarima(n=144, model = list(ma=c(rep(0,3),0.8)))
# SMA(1), 4 quarters
tsdisplay(s4ma1.sim)
```
##### b. Seasonal-AR(p) Model with parameter s:
$Y_t = c +\sum_1^p \phi_{is} Y_{t-is} +\epsilon_t$
ACF: spikes at n*s decaying
PACF: spikes at s, 2s,… , p*s
```{r}
s4ar1.sim <- sim_sarima(n=144, model = list(ar=c(rep(0,3),0.8)))
# SAR(1), 4 quarters
tsdisplay(s4ar1.sim)
```
#### 4. ARMA(p,q) Autoregressive Moving Average:
a. ARMA(p,q) = AR(p) + MA(q), inherited the characters from both AR and MA
b. Saves parameters: ARMA(1,1) is performing better than AR(3).
c. very subjective in the value selection of p and q…
```{r}
ar2ma2.sim<-arima.sim(model=list(ar=c(0.9,-0.2),ma=c(-0.7,0.1)),n=100)
#ARMA(2,2)
tsdisplay(ar2ma2.sim)
```
#### 5. ARIMA(p,d,q) Autoregressive integrated moving average:
stationary and invertible ARMA(p,q) after differencing d times.
a. Deal with nonstationary series
b. Include d order difference, to remove d or lower order trends.
c. Also deal with stochastic trends
d. Quantitative one-line data choose
**forecast::auto.arima(y, ic = c("aicc", "aic", "bic"))**
where y is a univariate time series, ic is the criterion.
Returns best ARIMA model with seasonsal ARIMA according to either AIC, AICc or BIC value.
```{r}
data = read.table("resources/Visualization_in_Time_Series_Analysis/w-gs1yr.txt", header = TRUE)
ts_data = ts(data$rate, start = 1962, deltat = 1/52, freq = 52)
autoplot(ts_data)+ ggtitle("US Weekly Interest Rates (%)")+ ylab("Interest Rate")
tsdisplay(diff(ts_data))
t = seq(1962, length = length(ts_data), by = 1/52)
fit = auto.arima(diff(ts_data))
summary(fit)
plot(t[-1], diff(ts_data), main = "US Weekly Interest Rates (%) (first difference)", ylab = "Rates",
xlab = "Time", type = "l")
lines(fit$fitted, col = "blue")
```
#### Aside:
1. AIC(Akaike Information Criterion)
a. biased towards overparameterized models
b. asymptotically efficient
**stats::AIC(fit, k=2)**
where:
fit is the a fitted model object,
k is the penalty per parameter to be used; the default k = 2 is the classical AIC.
2. BIC/SIC(Bayesian (Schwarz) Information Criterion)
a. Consistent
b. not asymptotically efficient
**stats::BIC(fit)**
where fit is the a fitted model object
Both AIC and BIC are the smaller the better.
### Summary
#### Steps
The full model includes a trend, seasonal dummies, and cyclical dynamics
For a given visualized data:
a. Extract the trend and seasonality (stl)
b. ACF & PACF, Remove any possible dynamics(tsdisplay, auto.arima)
c. Check if the residuals looks like a white noise
#### Residuals vs White Noise
```{r}
#this fit is following the last chunk
tsdisplay(resid(fit))
forecast::checkresiduals(fit)
```
Note:
1. Object in resid() can be either a time series model, a forecast object, or a time series (assumed to be residuals).
2. In the new graph, the scale of ACF and PACF is significantly smaller, thus we have removed some of the dynamics.
#### Test Parameter stability
CUSUM(Cumulative Sum of Standardized Recursive Residuals):
Check if the CUSUM leave the confidence interval to capture the dynamics.
Recursive Residuals: **strucchange::recresid(x, y, formula)**
```{r}
autoplot(ts(recresid(fit$res ~ t[-1])))+
ggtitle("The recursive residual of ARIMA(1,0,2)")+
ylab("recursive residual")
```
Empirical Fluctuation Processes: **efp(x, y, formula, type = "Rec-CUSUM")**
```{r}
plot(efp(fit$res ~ t[-1], type = "Rec-CUSUM"))
```
Note:
1. In the recursive residual's visialization, we find some instability in the middle.
2. Empirical Fluctuation Processes showed the parameter's stability is acceptable.
### Reference:
www.rdocumentation.org
My sincere appreciation to Dr. Randall Rojas (UCLA Econ 144 19’ Spring)’s work. The general model’s idea, both data set: cangas and the tidied “w-gs1yr.txt” is from the Econ 144 class.