-
Notifications
You must be signed in to change notification settings - Fork 0
/
04-gmwm.Rmd
executable file
·343 lines (231 loc) · 17.9 KB
/
04-gmwm.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
# The Generalized Method of Wavelet Moments
- A new framework for inertial sensor calibration.
- It is based on the Generalized Method of Wavelet Moments (GMWM) of @guerrier2013wavelet, which is a new statistical approach to estimate the parameters of (complex) time series models.
- The GMWM is able to estimate efficiently time series models which are commonly used to describe the errors of inertial sensors.
- This calibration approach provides considerable improvements (in terms of navigation performance) compared to existing methods.
- This methodology is robust (potentially applicable for FDI purposes) and is able to automatically select a suitable model (or rank models).
## The Wavelet Variance
The GMWM estimator is a GMM type estimator based on a quantity called the Wavelet Variance (WV). This quantity is computed by taking the variance of the Wavelet Coefficient wich are defined as
```{definition, defWC, name = "Wavelet Coefficient"}
In a similar way to the AV, we can define the Wavelet Variance (WV) at dyadic scales ($\tau_j$) for $j \in \left\{x \in \mathbb{N} \, : \; 1 \leq x < \log_2 (T) - 1 \right\}$. To do so, we first need to define the wavelet filters $h_{j,l}$ as ``weights'' having the following properties
\begin{equation*}
\sum_{l=0}^{L_j-1} h_{j,l}=0, \,\, \sum_{l=0}^{L_1-1} h_{1,l}^2 =\frac{1}{2} \, \mathrm{and } \, \sum_{l=-\infty}^{\infty} h_{1,l} h_{1,l+2m} = 0 ,
\label{def:wvfilter}
\end{equation*}
where $m \in \mathbb{N}^+$, $L_j = (2^j-1)(L_1-1)+1$ is the length of the filter at level $j$ and $L_1$ is the length of the first level filter $h_{1,l}$.\\[0.2cm]
Then, the wavelet coefficients $W_{j,t}$ are defined as
\begin{equation*}
W_{j,t} = \sum_{l=0}^{L_j-1}h_{j,l} X_{t-l}.
\end{equation*}
```
There exist many type of Wavelet Filter. In this book, we will focus on the Maximum Overlap Discrete Wavelet Transform (MODWT), which is the most usefull one.
```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='MODWT Wavelet Coefficient', echo=FALSE}
knitr::include_graphics("images/modwt.png")
```
The wavelet filters give non-zero weights to observations at a given lag (window sizes of length $L_j$). Hence, there are as many WV as there are scales, and can be computed on consecutive windows, or on overlapping windows (to get ${W}_{j,t}$ using ${h_j}$). Overlapping windows lead to more efficient estimators (such as the MODWT).
Based on the previously computed Wavelet Coefficients $$W_{j,t}$$, we want to compute the WV for each scale, which is defined as:
```{definition, defWV, name = "Wavelet Variance"}
The WV of the Wavelet Coefficient $W_{j,t}$, is the variance of the Wavelet Coefficients at level $j$
\begin{equation*}
\nu_{j}^2 \equiv \text{Var}[W_{j,t}].
\end{equation*}
```
```{lemma, lemmavtowv, name = "Relation between WV and AV"}
One of the propreties of the WV, is its exact relationship to the AV. Indeed, when using the Haar wavelet filter, $h_{j,l}$, $$ \nu_{j}^2 = 2 \text{AVar}_j$$. The proof of this proprety is given in @percival1994long.
```
Based on Lemma \@ref(lem:lemmavtowv), it is possible to compute the theoretical WV based on their AV counterpart.
```{example, label = wvtheoar1, name = "Theoretical WV of an AR(1) process"}
\begin{equation*}
Y_t = \phi Y_{t-1} + \varepsilon_t, \;\; \varepsilon_t \overset{iid}{\sim} \mathcal{N}\left( 0 , \sigma^2 \right), \;\; t = 1, ... , T
\end{equation*}
With $\tau_j = 2^j$, the theoretical (Haar) WV of such process is given by
\begin{equation*}
\nu_{j} =\frac{\left( \frac{\tau_j}{2} - 3 \phi - \frac{\tau_j \phi^2}{2} + 4\phi^{\frac{\tau_j}{2} + 1} - \phi^{\tau_j + 1} \right) \sigma^2}{\frac{\tau_j^2}{2} \left(1 - \phi \right)^2 \left(1 - \phi^2 \right)}.
\end{equation*}
```
Based on the analytical formula in Example \@ref(wvtheoar1), we can plot the theoretical WV of a AR1 process for various values of the parameter $\phi$, which is illustrated in fig.
```{r, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', out.width='80%', fig.cap='Theoretical WV for AR1 processes .', echo=FALSE}
knitr::include_graphics("images/AR1WVtheo.pdf")
```
```{definition, defWVest, name = "Wavelet Variance Estimator"}
An unbiased estimator for the WV issued from a certain wavelet transform is given by the \textbf{MODWT estimator} (see \cite{percival1995estimation})
%
\begin{equation*}
\hat{\nu}_{j}^2 = \frac{1}{M_j(T)} \sum_{t=L_j}^{T} {W}_{j,t}^2
\end{equation*}
%
where $M_j(T) = T - L_j + 1$.
```
There exist other estimators of the WV, in particular robust estimators that perform well when the observed time series suffers from the presence of outliers or other forms of contamination. However, when the observed time series is uncontaminated, $\hat{\nu}_{j}^2$ is the most statistically efficient.
```{lemma, lemmWVconcist, name = "Consistency of WV estimator"}
Let $(X_t)$ be such that:
\begin{itemize}
\item $(X_t - X_{t-1})$ is a (strongly) stationary process,
\item $(X_t - X_{t-1})^2$ has absolutely summable covariance structure,
\item $\mathbb{E}\left[(X_t - X_{t-1})^4 \right] < \infty$ for some $\delta > 0$.
\end{itemize}
Defining $\hat{\boldsymbol{\nu}} \equiv [\hat{\nu}_{j}^2]_{j = 1, \dots, J}$, with $J$ being a bounded quantity, then we have $$\hat{\boldsymbol{\nu}} \overset{ \mathcal{P} }{\longrightarrow} \boldsymbol{\nu} \,.$$
```
```{proof, proofWVconsist, name = "Proof of WV consistency"}
The proof of the element-wise consistency is direct from Theorem \ref{thm:wlln:dep}. Let $Z_t = X_t - X_{t-1}$, then since $Z_t$ is stationary with mean zero then so is $(X_t - X_{t-h})$ for all $h \in \mathbb{Z}$. This directly imply that for all $j = 1, \dots, J$, $(W_{j,t})$ is also stationary (since it based on a linear combination of stationary processes) and so is $Y_t = W_{j,t}^2$ (since it is based on a time invariant transformation of a stationary process). Moreover, there exist constant $c_h$ such that
\begin{equation*}
\sum_{h = -\infty}^{\infty} \; \gamma_Y (h) = \sum_{h = -\infty}^{\infty} \; c_{|h|} \gamma_{Z^2} (h).
\end{equation*}
Therefore, we obtain
\begin{equation*}
\sum_{h = -\infty}^{\infty} \; |\gamma_Y (h)| = \sum_{h = -\infty}^{\infty} \; |c_{|h|} \gamma_{Z^2} (h)| \leq \sup_{k = 1, ..., \infty} c_k \; \sum_{h = -\infty}^{\infty} \; |\gamma_{Z^2} (h)| < \infty,
\end{equation*}
since both terms are bounded. Using the same approach we have that $\mathbb{E}\left[Y_t^2\right]$ is bounded since $\mathbb{E}\left[Z_t^4\right]$ bounded. Thus, we can apply Theorem \ref{thm:wlln:dep} on the process $Y_t$, i.e.
\begin{equation*}
\hat{\nu}^2_j = \frac{1}{M_j(T)} \sum_{t=L_j}^{T} {W}_{j,t}^2\overset{ \mathcal{P} }{\longrightarrow} \nu^2_j.
\end{equation*}
\footnotesize
Since
$$\hat{\nu}^2_j \overset{ \mathcal{P} }{\longrightarrow} \nu^2_j \, ,$$
and given that $J$ is a bounded quantity, we have that
\begin{eqnarray*}
\|\hat{\boldsymbol{\nu}} - \boldsymbol{\nu}\|_2 &=& \sum_{j=1}^J (\hat{\nu}^2_j - \nu^2_j)^2\\
&<& J \,\, \underset{j}{\text{max}}\, ((\hat{\nu}^2_j - \nu^2_j)^2) \, .
\end{eqnarray*}
Based on the element-wise consistency, we have that
\begin{equation*}
J \,\, \underset{j}{\text{max}}\, ((\hat{\nu}^2_j - \nu^2_j)^2) \overset{ \mathcal{P} }{\longrightarrow} 0,
\end{equation*}
and hence
\begin{equation*}
\|\hat{\boldsymbol{\nu}} - \boldsymbol{\nu}\|_2 \overset{ \mathcal{P} }{\longrightarrow} 0 \, .
\end{equation*}
which concludes the proof.\hfill $\blacksquare$
```
Compare to consistency, asymptotic normality requires stronger conditions given in the following lemma. $J$ is also bounded.
```{lemma, lemmWVasynorm, name = "Asymptotic Normality of WV estimator"}
Let $(X_t)$ be such that:
- $(X_t - X_{t-1})$ is a (strongly) stationary process,
- $(X_t - X_{t-1})$ is a strong mixing process with mixing coefficient $\alpha(n)$ such that $\sum_{n=1}^{\infty} \alpha(n)^{\frac{\delta}{2+\delta}} < \infty$ for some $\delta > 0$,
- $\mathbb{E}\left[(X_t - X_{t-1})^{4+2\delta}\right] < \infty$ for some $\delta > 0$.
Then we have, $$\sqrt{T} \left(\hat{\boldsymbol{\nu}} - \boldsymbol{\nu} \right) \overset{ \mathcal{D} }{\longrightarrow} \mathcal{N}(\mathbf{0},\boldsymbol{\Sigma}),$$ where $\boldsymbol{\Sigma}$ is the asymptotic covariance matrix of $\hat{\boldsymbol{\nu}}$ with elements $\sigma^2_{ij} \equiv \sum_{h = -\infty}^{\infty}\text{cov}\left(W_{i,0}W_{j,0}, W_{i,h}W_{j,h} \right)$.
```
```{proof, proofWVasynorm, name = "Proof Asymptotic Normality of WV estimator"}
TO DO
```
Based on the asymptotic normality results (Lemma \ref{lemma:asy:WV}), we can construct the $(1-\alpha)$-confidence intervals for $\hat{\nu}_j$ as
%
\begin{equation*}
\text{CI}\left(\nu_j, \alpha \right) = \left[ \hat{\nu}_j \pm z_{1- \frac{\alpha}{2}} \frac{\sigma_{jj}}{T} \right],
\end{equation*}
%
where $z_{1- \frac{\alpha}{2}} \equiv \boldsymbol{\Phi}^{-1}\left( 1- \frac{\alpha}{2} \right)$ is the $(1- \frac{\alpha}{2})$ quantile of a standard normal distribution.\\
However, the Long-Run Variance $\sigma_{jj}$ is usually known. Many methods has been proposed to consistently estimate under mild conditions (see e.g. \cite{newey1986simple}).\\
However, for finite $T$, CI based on the asymptotic normality would be problematic, because the lower limit of CI can be negative. Alternatively, to avoid this problem, we can use following asymptotic result
%
\begin{equation*}
\eta \frac{\hat{\nu}_j}{\nu_j} \overset{ \mathcal{D} }{\longrightarrow} \chi^2_{\eta},
\end{equation*}
%
where $\eta$ is a constant which can be estimated by $\max\left\{\frac{M_j}{2^j}, 1\right\}$. Moreover, this method can also avoid the estimation of Long-Run Variance $\sigma_{jj}$.
Then the confidence interval is
%
\begin{equation*}
\text{CI}\left(\nu_j, \alpha\right) = \left[ \frac{\eta\hat{\nu}_j}{F^{-1}_{\chi^2_{\eta}}(\alpha/2)}, \; \frac{\eta\hat{\nu}_j}{F^{-1}_{\chi^2_{\eta}}(1-\alpha/2)} \right].
\end{equation*}
```{r exampleWVwhitenoise, fig.height = 5, fig.width = 6.5, cache = TRUE, fig.align='center', fig.cap='ADD A NICE CAPTION', warning=F}
# Load packages
library(wv) # Package for Wavelet Variance functions
library(simts) # Package for time series simulations
# Simulate white noise
Xt = gen_gts(WN(sigma2 = 1), n = 10^5)
# Compute the WV
wv_xt = wvar(Xt)
# WV log-log representation
plot(wv_xt)
```
## The GMWM
To recover the parameter vector $\boldsymbol{\theta}$ that from the specified model $F_{\boldsymbol{\theta}}$, we want to exploit the unique relationship between the theoretical WV and the vector $\boldsymbol{\theta}$.
In a similar way to the method based on the AV, the idea would be to find the model parameter values implied by the observed WV, i.e.
\begin{equation*}
\hat{\boldsymbol{\theta}} = \boldsymbol{\theta}(\hat{\boldsymbol{\nu}}) ,
\end{equation*}
where $\boldsymbol{\theta}(\boldsymbol{\nu})$ is the inverse of the function $\boldsymbol{\nu}(\boldsymbol{\theta})$.
```{definition, defgmwm, name = "Generalized Method of Wavelet Moments (GMWM)"}
The GMWM estimator is obtained as follows
\begin{equation*}
\hat{\boldsymbol{\theta}} = \underset{\boldsymbol{\theta} \in \boldsymbol{\Theta}}{\text{argmin}} \, (\hat{\boldsymbol{\nu}} - \boldsymbol{\nu}(\boldsymbol{\theta}))^T\boldsymbol{\Omega}(\hat{\boldsymbol{\nu}} - \boldsymbol{\nu}(\boldsymbol{\theta})),
\end{equation*}
where $\boldsymbol{\nu}(\boldsymbol{\theta})$ is the theoretical WV implied by a model with parameter vector $\boldsymbol{\theta}$ and $\boldsymbol{\Omega}$ is a positive-definite weighting matrix.
```
### Consistency
```{lemma, concistgmwm, name = "Consistency of the GMWM"}
Suppose that:
C1. The function $\boldsymbol{\nu}(\boldsymbol{\theta})$ is such that $\boldsymbol{\nu}(\boldsymbol{\theta})=\boldsymbol{\nu}(\boldsymbol{\theta}_0)$ if and only if $\boldsymbol{\theta}=\boldsymbol{\theta}_0$.
C2. The set $\boldsymbol{\Theta}$ is compact.
C3. The function $\boldsymbol{\nu}(\boldsymbol{\theta})$ is continuous in $\boldsymbol{\theta}$.
C4. $\hat{\boldsymbol{\nu}} \overset{\mathcal{P}}{\longrightarrow} \boldsymbol{\nu}$.
C5. $\mathbf{\Omega}$ is positive-definite and ``consistent'' (if estimated).
Then, we have:
\begin{equation*}
\hat{\boldsymbol{\theta}} \overset{ \mathcal{P} }{\longrightarrow} \boldsymbol{\theta}_0.
\end{equation*}
```
```{remark, rqident}
Condition (C1) is equivalent to Condition \ref{theo:consist:cond:unique:max} and, as seen earlier, it can be verified by checking the assumptions in Theorem \ref{Thm:homeo:gmm}. In the latter all assumptions are generally satisfied considering the most commonly employed time series models while showing that $J(\boldsymbol{\theta})$ is nonnegative (or nonpositive) is not necessarily straightforward, especially when dealing with latent models. The nonnegative or nonpositive property of $J(\boldsymbol{\theta})$ has been verified for a wide class of time series models (see next section).
```
```{proof, proofident, name = "Sketch Identifiability"}
Identifiability has been shown for the following class of models:
- [Model 1] Sum of a white noise, quantization noise and $K$ first-order autoregressive processes
- [Model 2] Sum of a first-order moving-average process and $K$ first-order autoregressive processes
- [Model 3] $K$ first-order autoregressive processes
- [Model 4] Sum of a white noise, quantization noise, drift and \textbf{random walk} process
- [Model 5] Sum of a first-order moving-average process, drift and \random walk process
```
```{proof, proofcocistgmwm, name"(Sketch): Consistency of the GMWM"}
While Condition (C2) is always assumed and, as discussed for Condition (C1), Condition (C3) is generally verified for all times series considered previously and is therefore nearly always verified
Condition (C4) is verified based on Lemma \ref{lemm:consistencyWV} and finally Condition (C5) is either assumed or can be verified using an appropriate estimator for $\boldsymbol{\Omega}$. Based on the latter two conditions, it is possible to verify that $\hat{Q}_n(\boldsymbol{\theta})$ converges uniformly in probability to $Q_0(\boldsymbol{\theta})$ (Condition \ref{theo:consist:cond:converge}).
Based on these verifiable (verified) conditions, we have that
$$\hat{\boldsymbol{\theta}} \overset{ \mathcal{P} }{\longrightarrow} \boldsymbol{\theta}_0,$$
thus concluding the proof.
\hfill $\blacksquare$
```
### Asymptotic Normality
```{theorem, asynormgmwm, name = "Asymptotic Normality of the GMWM Estimator"}
In addition to Conditions C1 to C5, we assume:
C6. $\boldsymbol{\theta}_0$ is an interior point to $\boldsymbol{\Theta}$
C7. $\mathbf{H}(\boldsymbol{\theta}_0) \equiv \frac{\partial}{\partial \boldsymbol{\theta} \partial \boldsymbol{\theta}^T} \boldsymbol{\nu}(\boldsymbol{\theta})\big|_{\boldsymbol{\theta}=\boldsymbol{\theta}_0}$ exists and is non-singular
C8. $\sqrt{T} (\hat{\boldsymbol{\nu}} - \boldsymbol{\nu}) \overset{ \mathcal{D} }{\longrightarrow} \mathcal{N}(\mathbf{0},\boldsymbol{\Sigma})$.
Then, we have
\begin{equation*}
\sqrt{T}(\hat{\boldsymbol{\theta}} - \boldsymbol{\theta}) \overset{ \mathcal{D} }{\longrightarrow} \mathcal{N}(\mathbf{0},\mathbf{V}) \,,
\end{equation*}
where $\mathbf{V}$ is the asymptotic covariance matrix of $\hat{\boldsymbol{\theta}}$.
```
```{proof, proofasynormgmwm}
The proof of asymptotic normality of the GMWM closely follows that of asymptotic normality for extremum estimators (see from Theorem \@ref(thm:CLTmulti) forwards).
Condition (C6) is another condition that inevitably needs to be assumed (as for Condition (C2)) while Condition (C7) is not necessary but guarantees the feasibility of Taylor expansion according to the approach used (we avoid technical details).
Knowing that Condition (C8) is verified (see Lemma \ref{lemma:asy:WV}), then we have that
\begin{equation*}
\sqrt{T}(\hat{\boldsymbol{\theta}} - \boldsymbol{\theta}) \overset{ \mathcal{D} }{\longrightarrow} \mathcal{N}(\mathbf{0},\mathbf{V}) \,.
\end{equation*}
\hfill $\blacksquare$
```
### Model Selection
In order to select a model among a set of candidate models for an observed time series, it is possible to estimate their out-of-sample prediction error and select the model that minimizes this error.
```{definition, defwvic, name = "Wavelet Variance Information Criterion (WVIC)"}
The WVIC provides the out-of-sample prediction error between the WV implied by the selected model and the WV observed out-of-sample:
\begin{equation*}
\text{WVIC} = \mathbb{E} \left[ \mathbb{E}_0 \left[ \left(\hat{\boldsymbol{\nu}}^0 - \boldsymbol{\nu}(\hat{\boldsymbol{\theta}})\right)^{T} {\boldsymbol{\Omega}} \left(\hat{\boldsymbol{\nu}}^0 - \boldsymbol{\nu}(\hat{\boldsymbol{\theta}})\right)
\right] \right].
\end{equation*}
```
```{definition, defwvicest1, name = "WVIC estimator"}
An estimator of the WVIC is given by
\begin{equation*}
\widehat{\text{WVIC}} = \left( \hat{\boldsymbol{\nu}} - \boldsymbol{\nu} ( \hat{\boldsymbol{\theta}})\right)^T \boldsymbol{\Omega} \left( \hat{\boldsymbol{\nu}} - \boldsymbol{\nu} ( \hat{\boldsymbol{\theta}})\right)
+ 2 \text{tr} \left\{ \widehat{\text{cov}} \left[ \hat{\boldsymbol{\nu}} , \boldsymbol{\nu}( \hat{\boldsymbol{\theta}}) \right] \boldsymbol{\Omega}^T\right\},
\end{equation*}
where $\widehat{\text{cov}} \left[ \hat{\boldsymbol{\nu}} , \boldsymbol{\nu}( \hat{\boldsymbol{\theta}})\right]$ is the estimated covariance matrix between the empirical WV and the implied WV evaluated at the estimated vector parameter $\hat{\boldsymbol{\theta}}$.
```
While the first term in $\widehat{\text{WVIC}}$ is given by the value of the GMWM objective function at the solution $\hat{\boldsymbol{\theta}}$, it is not straightforward to compute the second term (optimism):
$$\text{tr} \left\{ \widehat{\text{cov}} \left[ \hat{\boldsymbol{\nu}} , \boldsymbol{\nu}( \hat{\boldsymbol{\theta}}) \right] \boldsymbol{\Omega}^T\right\}$$.
There exist several way of computing the optimism term:
- Compute the analytic form of the covariance term (see Guerrier et al., 2018).
- Compute the term via parametric bootstrap (see \cite{guerrier2015wvic}).
- Compute it using independent replicates (see \cite{radi2017authomatic}).