forked from MathiasHarrer/Doing-Meta-Analysis-in-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
07-metaregression.Rmd
480 lines (317 loc) · 44.5 KB
/
07-metaregression.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
# Meta-Regression {#metareg}
![](_figs/regressionbild.jpg)
Conceptually, **Meta-Regression** does not differ much from a **subgroup analysis**. In fact, subgroup analyses with more than two groups are nothing more than a meta-regression with categorial predictors. However, meta-regression does also allow us to use **continuous data** as predictors and check whether these variables are associated with effect size differences.
```{block,type='rmdinfo'}
**The idea behind meta-regression**
You may have already performed regressions in regular data where participants or patients are the **unit of analysis**. In typical meta-analyses, we do not have the individual data for each participant available, but only the **aggregated effects**, which is why we have to perform meta-regressions with predictors on a **study level**. This also means that while we conduct analyses on participant samples much larger than usual for single studies, it is still very likely that **we do not have enough data for a meta-regression to be sensible**. In [Chapter 7](#subgroup), we told you that subgroup analyses make no sense when $k<10$. For **meta-regression**, Borenstein and colleages [@borenstein2011] recommend that **each covariate should at least contain ten studies**, although this should not be seen as an iron-clad rule.
In a conventional regression, we want to estimate a parameter $y$ using a covariate $x_i$ with $n$ regression coefficients $\beta$. A standard regression equation therefore looks like this:
$$y=\beta_0 + \beta_1x_1 + ...+\beta_nx_n$$
In a meta-regression, we want to estimate the **effect size** $\theta$ for different values of the predictor(s), so our regression looks like this:
$$\hat \theta_k = \theta + \beta_1x_{1k} + ... + \beta_nx_{nk} + \epsilon_k + \zeta_k$$
You might have seen that when estimating the effect size $\theta_k$ of a study $k$ in our regression model, there are two **extra terms in the equation**, $\epsilon_k$ and $\zeta_k$. The same terms can also be found in the equation for the random-effects-model in [Chapter 4.2](#random). The two terms signify two types of **independent errors** which cause our regression prediction to be **imperfect**. The first one, $\epsilon_k$, is the sampling error through which the effect size of the study deviates from its "true" effect. The second one, $\zeta_k$, denotes that even the true effect size of the study is only sampled from **an overarching distribution of effect sizes** (see the chapter on the [Random-Effects Model](#random) for more details). In a **fixed-effect model**, we assume that all studies actually share the **same true effect size** and that the **between-study heterogeneity** $\tau^2 = 0$. In this case, we do not consider $\zeta_k$ in our equation, but only $\epsilon_k$.
As the equation above includes **fixed effects** (the $\beta$ coefficients) as well as **random effects** ($\zeta_k$), the model used in meta-regression is often called **a mixed-effects-model**. Mathematically, this model is identical to the **mixed-effects-model** we described in [Chapter 7](#subgroup) where we explained how **subgroup analyses** work.
Indeed, as mentioned above, **subgroup analyses** are nothing else than a **meta-regression** with a **categorical predictor**. For meta-regression, these subgroups are then **dummy-coded**, e.g.
$$ D_k = \{\begin{array}{c}0:ACT \\1:CBT \end{array}$$
$$\hat \theta_k = \theta + \beta x_{k} + D_k \gamma + \epsilon_k + \zeta_k$$
In this case, we assume the same **regression line**, which is simply "shifted" **up or down for the different subgroups** $D_k$.
```
```{r, echo=FALSE, fig.width=6,fig.cap="Visualisation of a Meta-Regression with dummy-coded categorial predictors",fig.align='center'}
library(png)
library(grid)
img <- readPNG("_figs/dummy.PNG")
grid.raster(img)
```
<br><br>
```{block,type='rmdinfo'}
**Assessing the fit of a regression model**
To evaluate the **statistical significance of a predictor**, we a **t-test** of its $\beta$-weight is performed.
$$ t=\frac{\beta}{SE_{\beta}}$$
Which provides a $p$-value telling us if a variable significantly predicts effect size differences in our regression model. If we fit a regression model, our aim is to find a model **which explains as much as possible of the current variability in effect sizes** we find in our data.
In conventional regression, $R^2$ is commonly used to quantify the **goodness of fit** of our model in percent (0-100%). As this measure is commonly used, and many researchers know how to to interpret it, we can also calculate a $R^2$ analog for meta-regression using this formula:
$$R^2=\frac{\hat\tau^2_{REM}-\hat\tau^2_{MEM}}{\hat\tau^2_{REM}}$$
Where $\hat\tau^2_{REM}$ is the estimated total heterogeneity based on the random-effects-model and $\hat\tau^2_{REM}$ the total heterogeneity of our mixed-effects regression model.
```
<br><br>
---
## Calculating meta-regressions in R
Meta-regressions can be conducted in *R* using the `metareg` function in `meta`. To show the similarity between `subgroup` analysis and `meta-regression` with categorical predictors, I will first conduct a meta-regression with my variable `Control` as a predictor again.
```{r, echo=FALSE, message=FALSE, error=FALSE}
library(meta)
```
```{r}
metareg(m.hksj,Control)
```
We see in the output that the `metareg` function uses the values of `Control` (i.e, the three different types of control groups) as a **moderator**. It takes **"information only"** as a dummy-coded *reference group*, and **"no intervention"** and **"WLC"** as dummy-coded **predictors**. Under `Test of Moderators`, we can see that control groups are not significantly associated with effect size differences $F_{2,15}=0.947$, $p=0.41$. Our regression model does not explain any of the variability in our effect size data ($R^2=0\%$).
Below `Model Results`, we can also see the $\beta$-values (`estimate`) of both predictors, and their significance level `pval`. As we can see, both predictors are not significant.
<br><br>
**Continuous variables**
Let us assume I want to check if the **publication year** of a study is associated with effect size differences. I have stored the variable `pub_year`, containing the publication year of every study in my dataset, and conducted the meta-analysis with it. I stored my meta-analysis output in the `m.pubyear` output.
**Now, I can use this predictor in a meta-regression.**
```{r,echo=FALSE}
load("_data/Meta_Analysis_Data.RData")
madata<-Meta_Analysis_Data
madata$pub_year<-c(2001,2002,2011,2013,2013,2014,1999,2018,2001,2002,2011,2013,2013,2014,1999,2018,2003,2005)
madata$pub_year<-as.numeric(madata$pub_year)
m.pubyear<-metagen(TE,seTE,studlab = paste(Author),comb.fixed = FALSE,data=madata)
```
```{r}
output.metareg<-metareg(m.pubyear,pub_year)
output.metareg
```
As you can see from the output, `pub_year` was now included as a **predictor**, but it is not significantly associated with the effect size ($p=0.9412$).
<br><br>
---
## Plotting Regressions
```{r, echo=FALSE, fig.width=4,fig.cap="A finished bubble plot",fig.align='center'}
library(png)
library(grid)
img <- readPNG("_figs/metareg.PNG")
grid.raster(img)
```
To plot our meta-regression output, we can use the `bubble` function in `meta`. Here a few parameters we can specify for this function.
```{r,echo=FALSE}
i<-c("xlim","ylim","xlab","ylab","col","lwd","col.line","studlab")
ii<-c("The x-axis limit of the plot. Must be specified as, e.g., 'xlim=c(0,1)'","The y-axis limit of the plot. Must be specified as, e.g., 'ylim=c(0,1)'","The label for the x axis","The label for the y axis","The color of the individual studies","The line width of the regression line","The color of the regression line","If the labels for each study should be printed within the plot (TRUE/FALSE)")
ms<-data.frame(i,ii)
names<-c("Parameter", "Function")
colnames(ms)<-names
kable(ms)
```
```{r,eval=FALSE}
bubble(output.metareg,
xlab = "Publication Year",
col.line = "blue",
studlab = TRUE)
```
## Multiple Meta-Regression
![](_figs/mvreg.jpg)
In the beginning of [Chapter 8](#metareg), we already introduced the basic idea behind meta-regression. We showed that, in general, a meta-regression has the following form:
$$\hat \theta_k = \theta + \beta_1x_{1k} + ... + \beta_nx_{nk} + \epsilon_k + \zeta_k$$
Where $\hat \theta_k$ is the estimate of our effect size of a study $k$ and $\beta$ are the regression coefficients. There are two parameters in this formula which cause our effect size estimate to deviate from the "true" overall effect size: $\epsilon_k$, the sampling error, and $\zeta_k$, which denotes that the true effect size of the study is only sampled from an **overarching distribution of effect sizes**. As we explained before, if $\zeta_k>0$, the model we describe with this formula is a **random-effects meta-regression model**, also known as a *mixed-effects model*.
```{block,type='rmdinfo'}
**The idea behind multiple meta-regression**
Previously, we only considered the scenario in which we use **one predictor** $\beta_1x_1$ in our meta-regression (e.g., we want to check if the effect size a study reports depends on the year it was published). But imagine another scenario. Let us assume that we have made the experience in our previous research that the effect sizes we find for a certain study in some research field depends on the prestige of the journal in which the study was published (e.g., higher effects are reported in journals with a high reputation). On the other hand, it also seems apriori reasonable that journals with a good reputation publish studies of high quality, which may also be associated with higher effect sizes. So to check if journal reputation is indeed associated with higher effect sizes, we have to make sure that this relationship is not **confounded** by the fact that prestigious journals publish studies of higher quality, which may be the true reason we find this relationship between journal reputation and effect size. This means we have to **control** for study quality (e.g., rated on a scale from 0 to 10) when we examine the relationship between journal prestige (e.g., operationalized as the [impact factor](https://en.wikipedia.org/wiki/Impact_factor) of a journal) and effect size.
This, and many other possible scenarios can be dealt with using **multiple meta-regression**. In multiple meta-regression we use several predictors (variables) to predict (differences in) effect sizes. When we look back at the **general meta-regression** formula we defined before, we actually see that the formula already provides us with this feature through the $\beta_nx_{nk}$ part. Here, the parameter $n$ denotes that we can include $n$ more predictors/variables into our meta-regression, making it a multiple meta-regression.
While, statistically speaking, the $n$ parameter gives us the possibility to include as many predictors into our multiple meta-regression model as we wish to, things are a little more tricky in reality. In the following, we will discuss a few important pitfalls in multiple meta-regression and how we can build multiple meta-regression models which are robust and trustworthy. But first, let us cover another important feature of multiple meta-regression: **interactions**.
```
```{block,type='rmdinfo'}
**Interactions**
So far, in our multiple meta-regression model, we only considered the case where we have multiple predictor variables $x_1,x_2, ... x_n$, and along with their predictor estimates $\beta_n$, **add them together** to calculate our estimate of the true effect size $\hat \theta_k$ for each study $k$. In multiple meta-regression models, however, we can not only model such **additive relationships**. We can also model so-called **interactions**. Interactions mean that the **relationship** between one **predictor variable** (e.g., $x_1$) and the **estimated effect size** changes for different values of another predictor variable (e.g. $x_2$).
Imagine a scenario where we want to model two predictors and their relationship to the effect size: the **publication year** ($x_1$) of a study and the **quality** ($x_2$) of a study, which we rate like this:
$$\begin{equation}
x_2 = \left \{\begin{array}{ll}
0: bad
\\1: moderate
\\2: good
\end{array}
\right.
\end{equation}$$
As we described before, we can now imagine a meta-regression model in which we combine these two predictors $x_1$ and $x_2$ and assume an additive relationship. We can do this by simply adding them:
$$\hat \theta_k = \theta + \beta_1x_{1k} + \beta_2x_{2k} + \epsilon_k + \zeta_k$$
Let us assume that, overall, higher publication year ($x_1$) is associated with higher effect sizes (i.e., reported effect sizes have risen over the years). We could now ask ourselves if this positive relationship **varies** depending on the quality of the studies ($x_2$). For example, maybe this rise in effect sizes was strongest for high-quality studies, while effect sizes stayed mostly the same over the years for studies of lower quality? We can visualize our assumed relationship between effect size ($\hat \theta_k$), publication year ($x_1$) and study quality ($x_2$) the following way:
![](_figs/vis2.png)
To examine such questions, we can add an **interaction term** to our meta-regression model. This interaction term lets predictions of $x_1$ vary for different values of $x_2$ (and vice versa). We can denote this additional interactional relationship in our model by introducing a third predictor, $\beta_3$, which captures this interaction $x_{1k}x_{2k}$ we want to test in our model:
$$\hat \theta_k = \theta + \beta_1x_{1k} + \beta_2x_{2k} + \beta_3x_{1k}x_{2k}+ \epsilon_k + \zeta_k$$
```
<br></br>
### Common pitfalls of multiple meta-regression models {#pitfalls}
As we have mentioned before, multiple meta-regression, while **very useful when applied properly**, comes with **certain caveats** we have to know and consider when fitting a model. Indeed, some argue that (multiple) meta-regression is **often improperly used and interpreted in practice**, leading to a low validity of many meta-regression models [@higgins2004controlling]. Thus, there are some points we have to keep in mind when fitting multiple meta-regression models, which we will describe in the following.
```{block,type='rmdachtung'}
**Overfitting: seeing a signal when there is none**
To better understand the risks of (multiple) meta-regression models, we have to understand the concept of **overfitting**. Overfitting occurs when we build a statistical model which fits the data **too closely**. In essence, this means that we build a statistical model which can predict the data at hand very well, but performs bad at predicting future data it has never seen before. This happens if our model assumes that some variation in our data **stems from a true "signal" in our data, when in fact we only model random noise** [@iniesta2016machine]. As a result, our statistical model produces **false positive results**: it sees relationships where there are none.
![Illustration of an overfitted model vs. model with a good fit](_figs/overfitting.png)
Regression methods, which usually utilize minimization or maximization procedures such as Ordinary Least Squares or Maximum Likelihood estimatation, can be prone to overfitting [@gigerenzer2004mindless; @gigerenzer2008rationality]. Unfortunately, the risk of building a **non-robust model, which produces false-positive results**, is **even higher** once we go from conventional regression to **meta-regression** [@higgins2004controlling]. There are several reasons for this:
1. In meta-regression, our **sample of data is mostly small**, as we can only use the synthesized data of all analyzed studies $k$.
2. As our meta-analysis aims to be a **comprehensive overview of all evidence**, we have no **additional data** on which we can "test" how well our regression model can predict high or low effect sizes.
3. In meta-regressions, we have to deal with the potential presence of effect size heterogeneity ([see Chapter 6](#heterogeneity)). Imagine a case in which we have to studies with different effect sizes and non-overlapping confidence intervals. Every variable which has different values for the different studies might be a potential explanation for effect size difference we find, while it seems straightforward that **most of such explanations are spurious** [@higgins2004controlling].
4. Meta-regression as such, and multiple meta-regression in particular, make it very easy to **"play around" with predictors**. We can test numerous meta-regression models, include many more predictors or remove them in an attempt to explain the heterogeneity in our data. Such an approach is of course tempting, and often found in practice, because we, as meta-analysts, want to find a significant model which explains why effect sizes differ [@higgins2002statistical]. However, such behavior has been shown to massively **increase the risk of spurious findings** [@higgins2004controlling], because we can change parts of our model indefinitely until we find a significant model, which is then very likely to be overfitted (i.e., it mostly models statistical noise).
**Some guidelines have been proposed to avoid an excessive false positive rate when building meta-regression models:**
- Minimize the number of investigated predictors. In multiple meta-regression, this translates to the concept of **parsimony**, or simplicity: when evaluating the fit of a meta-regression model, we prefer models which achieve a good fit with less predictors. Estimators such as the *Akaike* and *Bayesian Information Criterion* can help with such decisions. We will get to this topic again below.
- Predictor selection should be based on **predefined scientific or theoretical questions** we want to answer in our meta-regression. Therefore it is crucial to define in the protocol before the start of the study, which predictors will be examined in the metaregression analyses.
- When the number of studies is low (which is very likely to be the case), and we want to compute the significance of a predictor, we should use the [Knapp-Hartung adjustment](#random) to obtain more reliable estimates [@higgins2002statistical].
- We can use **permutation** to assess the robustness of our model in resampled data. We will describe the details of this method later.
```
```{block,type='rmdachtung'}
**Multicollinearity**
Multicollinearity means that one or more predictor in our regression model can be (linearly) **predicted from another predictor** in our model with relatively high accuracy [@mansfield1982detecting]. This basically means that we have two or more predictors in our model which are **highly correlated**. Most of the dangers of multicollinearity are associated with the problem of **overfitting** which we described above. High collinearity can cause our predictor coefficient estimates $b$ to behave erratically, and change considerably with minor changes in our data. It also limits the size of the explained variance by the model, in our case the $R^2$ analog.
**Multicollinearity in meta-regression is common** [@berlin1992advantages]. Although multiple regression can handle lower degrees of collinearity, we should **check** and, if necessary, **control for very highly correlated predictors**. There is no consolidated yes-no-rule for the presence of multicollinearity. A crude, but often effective way is to check for very high correlations (i.e., $r\geq0.8$) before fitting the model. Multicollinearity can then be reduced by either (1) removing one of the close-to-redundant predictors, or (2) trying to combine the predictors into one single predictor.
```
<br></br>
### Model building methods
When building a multiple regression model, there are **different approaches through which we can select and insert predictors** into our model. Here, we discuss the most important ones along with their strengths and weaknesses:
- **Forced entry.** In forced entry methods, all relevant predictors are forced into the regression model simultaneously. In most functions in *R*, this is the default setting. Although this is a generally recommended procedure [@field2012discovering], keep in mind that all predictors to use in forced entry should still be based on a predefined, theory-led decision.
- **Hierarchical.** Hierchical multiple regression means including predictors into our regression model stepwise, but based on a clearly defined scientific rationale. First, only predictors which have been associated with effect size differences in previous research are included in the order of their importance. After this step, novel predictors can be added to explore if these variables explain heterogeneity which has not yet been captured by the known predictors.
- **Stepwise.** Stepwise methods mean that variables/predictors are added to the model one after another. At first glance, this sounds a lot like hierarchical regression, but there is a crucial difference: stepwise regression methods select predictors based on a *statistical* criterion. In a procedure called **forward selection**, the variable explaining the largest amount of variability in the data is used as the first predictor. This processes is then repeated for the remaining variables, each time selecting the variable which explains most of the *remaining* (unexplained) variability in the data as the next predictor, which is then added to the model. Furthermore, there is a procedure called **backward selection**, in which all variables are used as predictors in the model first, and then removed successively based on a predefined statistical criterion. There is an extensive literature discouraging the usage of stepwise methods [@whittingham2006we; @chatfield1995model], and if we recall the common [pitfalls](#pitfalls) of multiple regression models we presented above, it becomes clear that these methods have high risk of producing spurious findings. Nevertheless, stepwise methods are still frequently used in practice, making it important to know that these procedures exist. If we use stepwise methods ourselves, however, it is advised to primarily do so in an exploratory fashion and to keep the limitations of this procedure in mind.
- **Multimodel inference.** Multimodel methods differ from stepwise methods as they do not try to successively build the "best" one model explaining most of the variance. Instead, in this procedure, *all* possible combinations of a predifined selection of predictors are modeled, and evaluated using a criterion such as Akaike's Information Criterion, which rewards simpler models. This enables a full eximination of all possible models, and how they perform. A common finding using this procedure is that there are many different kinds of predictor combinations within a model which lead to a good fit. In multimodel inference, the estimated coefficients of predictors can then be synthesized across all possible models to infer how important certain predictors are overall.
<br></br>
### Using `metafor` to compute Multiple Meta-Regressions
After all this input, it is now **time to fit our first multiple regression models**. To do this, we will not use the `meta` package we have used so far. Instead, we will use the `metafor` package [@viechtbauer2010conducting]. This package provides a vast array of advanced functionalities for meta-analysis, along with great [documentation](http://www.metafor-project.org/doku.php/). We will also use this package in a later chapter where we focus on **Multilevel Meta-Analysis**, so it is worth getting accustomed to the inner workings of this package. So, to begin, let us make sure we have `metafor` installed and loaded in our library.
```{r, eval=FALSE}
library(metafor)
```
For our multiple meta-regression examples, i will use my `mvreg.data` dataset. This is a "toy" dataset, which we simulated for illustrative purposes. **Let's have a look at the structure of my data:**
```{r, echo=FALSE}
load("_data/mvreg_data.rda")
levels(mvreg.data$continent)[levels(mvreg.data$continent)==0] = "Europe"
levels(mvreg.data$continent)[levels(mvreg.data$continent)==1] = "North America"
mvreg.data$continent = as.character(mvreg.data$continent)
```
```{r, echo=FALSE}
str(mvreg.data)
```
We see that there are 6 variables in our dataset. The `yi` and `sei` columns store the **effect size** and **standard error** of a particular study. Thus, these columns correspond to the `TE` and `seTE` columns we used before. We have named these variables this way because this is the standard notation that `metafor` uses: `yi` corresponds to the effect size $y_i$ we want to predict in our meta-regression, while `sei` is $SE_i$, the standard error. To designate the variance of an effect size, `metafor` uses `vi`, or $v_i$ in mathematical notation, which we do not need here because `yi` and `sei` contain all the information we need.
The other four variables we have in our dataset are potential predictors for our meta-regression. We want to check if `reputation`, the (mean-centered) impact score of the journal the study was published in, `quality`, the quality of the study rated from 0 to 10, `pubyear`, the (standardized) publication year, and `continent`, the continent in which the study was performed, are associated with different effect sizes.
For `continent`, note that we store information as a predictor with 2 labels: `Europe` and `North America`, meaning that this predictor is a **dummy variable**. Always remember that such dummy variables have to be converted from a `chr` to a factor vector before we can proceed.
```{r}
mvreg.data$continent = factor(mvreg.data$continent)
```
<br></br>
#### Checking for multicollinearity
As we mentioned before, we have to check for multicollinearity of our predictors to make sure that our meta-regression model coefficient estimates are robust. A quick way to check for high correlation is to calculate a **intercorrelation matrix** for all continuous variables with the following code:
```{r}
cor(mvreg.data[,3:5])
```
The `PerformanceAnalytics` package provides the `chart.Correlation` function which we can use to visualize the intercorrelations. Make sure to install the `PerformanceAnalytics` package first, and then use this code:
```{r, warning=FALSE, message=FALSE}
library(PerformanceAnalytics)
chart.Correlation(mvreg.data[,3:5])
```
We see that our variables are indeed correlated, but probably not to a degree that would warrant excluding one of them.
<br></br>
#### Fitting a meta-regression model without interaction terms
Now, let us fit our first meta-regression model, for now without assuming any interactions between predictors. Previously, we wanted to explore if a high **reputation** of the journal a study was published in predicts higher effect sizes. Let us also assume we already know very well from previous literature that the **quality** of a study is also associated with differences in effects. We can now perform a hierarchical regression where we first include our known predictor `quality`, and then check if `reputation` explains heterogeneity beyond that while taking `quality` into account.
To do this, we use the `rma` function in `metafor` to perform a **random-effects meta-regression**. This function has countless parameters, but we will only concentrate on the following (to see all parameters, type `?rma` into your Console):
```{r,echo=FALSE}
i<-c("yi","sei","data", "method","mods","test")
ii<-c("The column of our dataset in which the effect size for each study is stored.",
"The column of our dataset in which the standard error of the effect size for each study is stored.",
"The dataset with our meta-analysis data.",
"The estimator for tau-squared we want to use. For meta-regression models, it is advised to use the
maximum likelihood ('ML') or restricted maximum likelihood ('REML') estimator.",
"This parameter defines our meta-regression model. First, we specify our model with '~' (a tilde). Then, we add the predictors we want to include, separating them with '+' (e.g., 'variable1 + variable2'). Interactions between two variables are denoted by 'variable1*variable2'.",
"The test we want to apply for our regression coefficients. We can choose from 'z' (default) and 'knha' (Knapp-Hartung method).")
ms<-data.frame(i,ii)
names<-c("Parameter", "Function")
colnames(ms)<-names
kable(ms)
```
First, let us perform a meta-regression using only `quality` as the predictor along with Knapp-Hartung adjustments. **We save the results as `model1`, and then inspect the output**.
```{r, warning=FALSE, message=FALSE}
model1 <- rma(yi = yi,
sei = sei,
data = mvreg.data,
method = "ML",
mods = ~ quality,
test = "knha")
model1
```
From the output, we can see the results for our predictor `quality` under `Model Results` and the values for our regression intercept (`intrcpt`). We see that the $p$ value for our predictor is non-significant $p=0.0688$, but that there is a trend for an association, because $p<0.1$. Under `Test of Moderators (coefficient(s) 2)`, we can see the overall test results for our regression model ($F_{1,34}=3.68, p=0.0688$). Because we only included one predictor, the $p$-value reported there is identical to the one we saw before. In total, our model explains $R^2=2.23\%$ of the heterogeneity in our data, which we can see next to the `R^2 (amount of heterogeneity accounted for)` line in our output.
**Now, let us check if we can improve our model by including `reputation` as predictor. We add `+ reputation` in our argument to `mods` and save the output to `model2`.**
```{r, warning=FALSE, message=FALSE}
model2 <- rma(yi = yi,
sei = sei,
data = mvreg.data,
method = "ML",
mods = ~ quality + reputation,
test="knha")
model2
```
We now see that a new line appears under `Model Results` displaying the results for our `reputation` predictor. We see that a coefficient of $b=0.0343$ was calculated for `reputation`. We also see that this predictor is highly significant ($p<0.0001$) and positively associated with the effect sizes. We also see that the meta-regression model itself is significant ($F_{2,33}=12.25, p<0.0001$), explaining $R^2=66.95\%$ of the heterogeneity. This means that journal reputation is **associated with higher effect sizes**, even when controlling for study quality.
**But is `model2` indeed better than our first model, `model1`? To assess this, we can use the `anova` function, feeding it with the two models we want to compare.**
```{r}
anova(model1, model2)
```
This function performs a model test and provides us with several statistics to assess if `model2` has a better fit than `model1`. Here we compare our `full` model, `model2`, which includes both `quality` and `reputation`, with our `Reduced` model, which only uses `quality` as a predictor.
The `anova` function performs a **Likelihood Ratio Test**, the results of which we can see in the `LRT` column. The test is highly significant ($\chi^2_1=19.50, p<0.001$), which means that that our full model indeed provides a better fit. Another important statistic is reported in the `AICc` column. This provides us with the *Akaike's Information Criterion*, corrected for small samples. As we mentioned before, AICc penalizes complex models with more predictors to avoid overfitting. It is important to note that **lower values of AIC(c) mean that a model performs better**. Interestingly, in our output, we see that the full model ($AICc=20.77$) has a better AIC value than our reduced model ($AICc=37.73$), despite having more predictors. All of this suggests that our multiple regression **does indeed provide a good fit** to our data.
<br></br>
#### Modeling interaction terms
Let us say we want to model an **interaction hypothesis** with our additional predictors `pubyear` (publication year) and `continent`, where we assume that the relationship between publication year and effect size differs for Europe and North America. To model this in our `rma` function, we have to **connect our predictors** with `*` in the `mods` parameter. Because we do not want to compare the models directly using the `anova` function, we specify the $\tau^2$ estimator to be `"REML"` (restricted maximum likelihood) this time:
```{r, warning=FALSE, message=FALSE}
interaction.model <- rma(yi=yi,
sei=sei,
data=mvreg.data,
method = "REML",
mods = ~ pubyear*continent,
test="knha")
interaction.model
```
The last line, `pubyear:continentNorth America`, is the coefficient for our interaction term. Note that `metafor` automatically **includes not only the interaction term**, but also both `pubyear` and `contintent` as **"normal" lower-order predictors** (as one should do). Also note that, as `continent` is a factor, `rma` detected that this is a **dummy predictor**, and used our category `Europe` as the $D=0$ dummy against which the `North America` category is compared. We see that our interaction term `pubyear:continentNorth America` has a positive coefficient ($b=0.6323$), and that it is highly significant ($p<0.0001$), meaning that assumed interaction effect might in fact exist: there is an increase in effect sizes in recent years, but it is stronger for studies conducted in North America. We also see that model we fitted explains $R^2=100\%$ of our heterogeneity. This is because our data was simulated for illustrative purposes. In practice, you will hardly ever explain all of the heterogeneity in your data (in fact, one should rather be concerned if one finds such results in real-life data, as this might mean we have overfitted our model).
<br></br>
#### Testing the robustness of our model through a permutation test
```{block,type='rmdinfo'}
**The idea behind permutation**
Permutation is a **mathematical operation** in which we take a **set** containing numbers or objects, and iteratively draw elements from this set to put them in a sequential order. When we already have a ordered set of numbers, this equals a process in which we **rearrange**, or **shuffle**, the order of our data. As an example, imagine we have a set $S$ containing **three numbers**: $S=\{1,2,3 \}$. One possible permutation of this set is $(2,1,3)$; another is $(3,2,1)$. We see that the permuted results both contain **all three numbers from before**, but in a different order.
Permutation can also be used to perform **permutation tests**, which is a specific form of **resampling methods**. Broadly speaking, resampling methods are used to validate the **robustness** of a statistical model by providing it with (slightly) different data sampled from the same data source or generative process [@good2013permutation]. This is a way to better **assess if the coeffients in our model indeed capture a true pattern underlying our data**, or if we overfitted our model, thereby falsely assuming patterns in our data when they are in fact statistical noise.
Permutation tests do not require us to have a spare "test" dataset on which we can evaluate how our meta-regression actually performs in predicting effect sizes. For this reason, among others, permutation tests have been **recommended** to assess the robustness of our meta-regression models [@higgins2004controlling].
We will not go too much into the **details of how a permutation test is performed for meta-regression** models, but the most important part is that we **re-calculate** the **p-values** of our overall meta-regression model and its coefficients based on the test statitics obtained across all possible, or many randomly selected, permutations of our original dataset (or, to be more precise, our meta-regression model matrix $X_{ij}$). The crucial indicator here is **how often the test statistic we obtain from in our permuted data is equal or greater than our original test statistic**. For example, if our test statistic is greater or equal to the original test statistic in 50 of 1000 permuted datasets, we get a p-value of $p=0.05$ [@viechtbauer2015comparison].
```
To **perform a permutation test** on our meta-regression model, we can use `metafors` in-built `permutest` function. As an example, I will now recalculate the results of my `model2` model we fitted above. We only have to supply the `permutest` function with the `rma` object, but **be aware that the permutation test can be computationally intensive, especially for large datasets, so the function might need some time to run**.
```{r, eval=FALSE}
permutest(model2)
```
```{r, echo=FALSE}
permutest(model2, progbar=FALSE)
```
We again see our **familiar output** including the **results for all predictors**. Looking at the `pval*` column, we see that our p-value for the `reputation` predictor has decreased from $p<0.0001$ to $p^*=0.001$. As this $p$-value is still highly significant, this indicates that our model might indeed capture a real pattern underlying our data. It has been **recommended** to always use this **permutation test on our meta-regression model before we report a meta-regression model** to be significant in our research [@higgins2004controlling].
```{block,type='rmdachtung'}
**Permutation tests in small datasets**
Please note that when the **number of studies** $k$ included in our model is **small**, conventionally used tresholds for statistical significance (i.e., p < 0.05) **cannot be reached**. For meta-regression models, a permutation test using `permutest` will only be able to reach statistical significance if $k \geq 5$. More details on the `permutest` function can be found [here](https://www.rdocumentation.org/packages/metafor/versions/1.9-9/topics/permutest).
```
<br></br>
#### Multimodel Inference
Previously, we already described that one can also try to **model all possible predictor combinations** in a procedure called **multimodel inference** to examine which possible predictor combinations provide the best fit, and which predictors are the most important ones overall.
To perform multimodel inference, you can use the `multimodel.inference` function we prepared for you. This function is part of the [`dmetar`](#dmetar) package. If you have the package installed already, you have to load it into your library first.
```{r, eval=FALSE}
library(dmetar)
```
If you don't want to use the `dmetar` package, you can find the source code for this function [here](https://raw.githubusercontent.com/MathiasHarrer/dmetar/master/R/mreg.multimodel.inference.R). In this case, *R* doesn't know this function yet, so we have to let *R* learn it by **copying and pasting** the code **in its entirety** into the **console** on the bottom left pane of RStudio, and then hit **Enter ⏎**. The function requires the `MuMIn`, `ggplot2` and `metafor` package to work.
For this function, the following parameters need to be specified:
```{r,echo=FALSE}
i<-c("TE", "seTE", "data", "predictors", "method", "test", "eval.criterion", "interaction")
ii<-c(
"The precalculated effect size for each study. Must be supplied as the name of the effect size column in the dataset (in quotation marks; e.g. TE = 'effectsize').",
"The precalculated standard error for each study. Must be supplied as the name of the standard error column in the dataset (in quotation marks; e.g. seTE = 'se').",
"A data.frame containing columns for the effect size, standard error and meta-regression predictors of each study/effect.",
"A concantenated array of characters specifying the predictors to be used for multimodel inference. Names of the predictors must be identical to the names of the column names of the data.frame supplied to data.",
"Meta-analysis model to use for pooling effect sizes. Use 'FE' for the fixed-effect model. Different random-effect models are available: 'DL', 'HE', 'SJ', 'ML', 'REML', 'EB', 'HS', 'GENQ'. If 'FE' is used, the test argument is automatically set to 'z', as the Knapp-Hartung method is not meant to be used with fixed-effect models. Default is 'REML', and it is strongly advised to remain with this option to use a standard (mixed-effects) meta-regression model.",
"Method to use for computing test statistics and confidence intervals. Default is 'knha' which uses the Knapp-Hartung (Knapp & Hartung, 2003) adjustment method. 'Conventional' Wald-type tests and CIs are calculated by setting this argument to 'z'. When method='FE', this argument is set to 'z' automatically as the Knapp-Hartung method was not meant to be used with fixed-effect models.",
"Evaluation criterion to sort the multiple models by. Can be either 'AICc' (default; corrected Akaike's Information Criterion), 'AIC' (Akaike's Information Criterion) or 'BIC' (Bayesian Information Criterion).",
"If set to FALSE (default), no interactions between predictors are considered. Setting this parameter to TRUE means that all interactions are modeled.")
ms<-data.frame(i,ii)
names<-c("Parameter", "Function")
colnames(ms)<-names
kable(ms)
```
**Now, let us perform multimodelling for all possible predictor combinations using all predictors in the `mvreg.data` dataset, excluding all interaction terms**. Be aware that this can take some time, especially if the number of predictors is large.
```{r, echo=FALSE, message=FALSE, warning=FALSE}
source("dmetar/mreg.multimodel.inference.R")
load("_data/mvreg_data.rda")
library(MuMIn)
library(metafor)
library(ggplot2)
set.seed(123)
```
```{r, warning=FALSE, message=FALSE, fig.height=2, fig.width=6, fig.align="center"}
multimodel.inference(TE = "yi",
seTE = "sei",
data = mvreg.data,
predictors = c("pubyear", "quality", "reputation", "continent"),
interaction = FALSE)
```
**Let's go through the output one after another:**
- `Multimodel Inference: Final Results`. This part of the output provides us with details about the fitted models. We see that the total number of $2^4=16$ possible models have been fitted. We also see that the function used the corrected AIC (`aicc`) to compare the models.
- `Best 5 Models`. Displayed here are the **five models with the lowest AICc**, sorted from low to high. Predictors are displayed as columns of the table, and models as rows. A number (weight) or `+` sign (for categorical predictors) indicates that a predictor/interaction term was used for the model, while empty cells indicate that the predictor was omitted in this model. We see that `TE ~ 1 + continent + pubyear + reputation` shows the best fit ($AICc=6.0$). But we also see that the other models including other predictor combinations come very close to this value. Thus, it is hard to say which model really is the "best" model. Nevertheless, we see that all of the best five models contain the predictor `pubyear`, suggesting that this variable might be particularly important.
- `Multimodel Inference Coefficients`. Here, we can see the **coefficients of all predictors** we used for our models **aggregated** over all models in which they appear. We see that the coefficient `Estimate` is largest for `pubyear` ($b=0.378$), which corroborates our finding from before. Approximate confidence intervals can be obtained by subtracting and adding the value stored in `Std.Error`, multiplied by 1.96, from/to `Estimate`.
- **Model-averaged importance of terms plot**. In the plot, we see the averaged importance of each predictor across all models displayed as a bar chart. We again see that `pubyear` is the most important predictor, followed by `reputation`, `continent`, and `quality`.
```{block,type='rmdinfo'}
This example should make clear that Multimodel Inference can be a useful way to obtain a **comprehensive look on which predictors are more or less important** for predicting differences in effect sizes. Despite avoiding some of the problems of stepwise regression methods, it should be noted that this method should still be rather seen as **exploratory**, and may be used when we have no prior knowledge on how our predictors are related to effect sizes in the research field we meta-analyze. If you decide to build a meta-regression model using the information from multimodel inference, it is **essential to report that you have used this procedure before building the model**, because your model will then not be based on an apriori hypothesis, but on statistical properties.
```
```{block,type='rmdinfo'}
The output of the `mreg.multimodel.inference` function displayed after running the function is **only a small part of all results returned by the function**. If we want to access all results of the function, we can save the results to an object (e.g. `results <- multimodel.inference(...)`), and then open them using the 'dollar' operator. The following results are returned, among other things:
* `all.models`: the AICc statistics for all fitted models, ordered from best to worst.
* `top5.models`: the top 5 models by AICc we already saw in the output of the function before.
* `multimodel.coef`: the averaged multimodel coefficient estimates we already saw before.
* `predictor.importance`: The predictor importance table we already saw before.
* `formula`: The full formula used to conduct the multimodel analysis.
```
---