forked from MathiasHarrer/Doing-Meta-Analysis-in-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
13-SEM-meta-analysis.Rmd
929 lines (623 loc) · 55.3 KB
/
13-SEM-meta-analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
# Structural Equation Modeling Meta-Analysis {#sem}
![](_figs/sem_title.jpg)
$$ $$
In the last chapters on [Three-Level Meta-Analysis Models](#mlma) and [Bayesian Meta-Analysis](#bayesianma), we were able to generalize our conceptual knowledge of meta-analyses by showing that meta-analytic models have an **inherent multilevel structure**, which can be used, for example, to extend conventional meta-analysis models to three-level models. A peculiar thing about statistical methods is that they are often put into seperate "boxes", and treated as unrelated in research and practice, when in fact they are not. For many social scientists, for example, it is often surprising to find out that Analysis of Variance (ANOVA) and dummy-coded Regression are doing essentially the [same thing](https://www.theanalysisfactor.com/why-anova-and-linear-regression-are-the-same-analysis/) [@montgomery2001design]. This often happens because two methods are traditionally used in different contexts, and taught as separate entities.
It might thus have been only fairly recently that researchers recognized that **multilevel models** are simply a special form of a **Structural Equation Model**, or **SEM** [@bauer2003estimating; @mehta2005people]. As said before, every meta-analysis model is in itself a multilevel model, so this association also has exciting implications for meta-analysts: **we can treat our meta-analysis as a structural equation model**, in which the pooled effect size we want to estimate is the **latent** (or unobserved) variable [@cheung2015metasem; @cheung2015meta]. This does not *only* mean that we can model previous types of meta-analyses we presented before from a SEM perspective, but also allows us to use structural equation modeling to test more complex models meta-analytically. This is a great advantage; for example, using this approach, we can test **mediation** models, **factor analytic** models, or perform **multivariate meta-analyses** based on effect size data obtained from several independent studies. This is a great way to evaluate if certain models or theories in the literature are actually correct if we use all available evidence, if the theory's assumptions are not backed by the evidence, or, even more interestingly, if the theory does only apply to a subgroup of individuals or entities.
$$ ~ $$
## The Idea behind Meta-Analytic SEM {#sem_about}
![](_figs/SEM_titel2.jpg)
```{block, type='rmdinfo'}
**What is Structural Equation Modeling?**
Structural Equation Modeling (SEM) is a statistical technique to model and test hypotheses about the relationship of **manifest** (observed) and (usually) **latent** (unobserved or unobservable) variables [@kline2015principles]. Latent variables, as said, are either not observed or observable (personality, for example, is a latent construct which can only be measured indirectly, for example through different items in a questionnaire). In SEM, the assumed relationship between the manifest and latent variables (the "structure") can be modeled using the manifest, measured variables while taking their measurement error into account.
SEM analysis is somewhat different to "conventional" statistical tests (e.g., $t$-tests). Usually, for example in a $t$-test, the researcher tests against a null hypthesis, such as $H_0:\mu_1 = \mu_2$ (where $\mu_1$ and $\mu_2$ are the means of two groups). The researchers "aims" to *reject* the null hypothesis to conclude that the two groups differ. In SEM, a specific model is proposed beforehand instead, and the researcher "aims" to *accept* this model if the **goodness of fit** is sufficient [@cheung2015meta].
```
### Model Specification
Usually, SEM are specified and represented mathematically as a series of **matrices**. You can think of a matrix as a simple table containing rows and columns, much like a `data.frame` object in *R* (in fact, most `data.frame`s can be easily converted to a matrix using the `as.matrix()` function). Visually, SEM can be represented as **path diagrams**. Such path diagrams are often very intuitive, so we will start specifying a SEM using this approach first, and then move on to the matrix notation.
#### Path Diagrams
Path diagrams represent the mathematical model of our SEM graphically. There is no full consensus on how path diagrams should be drawn, yet there are a few conventions. Here are the main components of path diagrams, and how they are represented.
```{r,echo=FALSE, warning=FALSE, message=FALSE}
library(knitr)
Code<-c("$\\square$", "$\\circ$", "$\\triangle$", "$\\rightarrow$", "$\\leftrightarrow$")
Name = c("Rectangle", "Circle", "Triangle", "Arrow", "Double Arrow")
Description<-c("Manifest/observed variables",
"Latent/unobserved variables",
"Intercept (fixed vector of 1s)",
"Prediction. The variable at the start of the arrow predicts the variable at the end of the arrow: Predictor $\\rightarrow$ Target.",
"(Co-)Variance. If a double arrow connects two variables (rectangles/circles), it signifies the covariance/correlation between the two variables. If a double arrow forms a loop on one single variable, it signifies the variance of the variable.")
m<-data.frame(Code,Name, Description)
names<-c("Symbol","Name","Description")
colnames(m)<-names
kable(m, escape = FALSE)
```
As an illustration, let us create path diagram for a simple linear (non-meta-analytic) regression model, in which we want to predict $y$ with $x$. The model formula looks like this:
$$y_i = \beta_0 + \beta_1x_i + e_i$$
In this model, $x_i$ and $y_i$ are the observed variables. There are no unobserved variables. The parameter $\mu_x$ is the population mean of $x$, while the population mean of $y$ is the regression intercept, $\beta_0$. The variance of our observed data for $x$ is $\sigma^{2}_{x}$. Provided that $x$ is not a perfect predictor of $y$, there will be some amount of error variance $\sigma^{2}_{e_y}$ in our predictions associated with $y$. There are two regression coefficients: $\beta_0$, the intercept, and $\beta_1$, the slope coefficient of $x$.
**Using these components, we can build a graphical model for a simple univariate linear regression:**
![](_figs/regression_path.png)
We can also use this graphical model as a starting point to reassemble the regression model equation. From the model, we can infer that $y$ is influenced by two components: $x \times \beta_1$ and $1 \times \beta_0$. If we add these two parts together, we again arrive at the formula for $\hat{y}$ from before.
#### Matrix Representation
There are several common ways to represent SEM mathematically through matrices [@joreskog2006lisrel; @muthen2012mplus; @mcardle1984some]. Here, we will focus on the **Reticular Action Model** formulation, or **RAM** [@mcardle1984some], because this formulation is used by the `metaSEM` package we will be introducing later on. In the RAM, four matrices are specified: $\mathbf{F}$, $\mathbf{A}$, $\mathbf{S}$ and $\mathbf{M}$. Because the $\mathbf{M}$ matrix is not necessary to fit the meta-analytic SEM we will present later, we omit it here (see @cheung2015meta for a more extensive introduction). We will now specify the remaining three matrices for our linear regression from before. The three matrices all have the same number of rows and columns, corresponding with the (manifest and latent) variables we have in our model: $x$ and $y$. The generic structure in our regression example for all matrices therefore looks like this:
![](_figs/M1.png)
**The $\mathbf{A}$ Matrix: Single Arrows**
The $\mathbf{A}$ matrix represents the asymmetrical (single) arrows in our path model. The way to fill this matrix is to search for the matrix **column entry** of the variable in which the arrow starts ($x$), and then for the matrix **row entry** of the variable in which the arrow ends ($y$). We put the value of our arrow, $\beta_1$, where the selected column and row intersect in the matrix ($i_{y,x}$). Given that there are no more paths between the variables in our matrix, we fill all other fields with $0$. The $\mathbf{A}$ matrix for our example therefore looks like this:
![](_figs/M2.png)
**The $\mathbf{S}$ Matrix: Double Arrows/Variances**
The $\mathbf{S}$ matrix represents the (co)variances we want to estimate for the included variables. For $x$, our predictor, we want to estimate the variance $\sigma^{2}_{x}$. For our estimated regression target $\hat{y}$, we want to know the error variance $\sigma^{2}_{e_y}$. We therefore specify $\mathbf{S}$ like this:
![](_figs/M4.png)
**The $\mathbf{F}$ Matrix: Observed Variables**
The $\mathbf{F}$ matrix allows us to specify the observed (vs. latent) variables in our model. To specify that a variable has been observed, we simply insert $1$ in the respective diagonal field of the matrix. Given that both $x$ and $y$ are observed in our model, we put $1$ in all diagonal fields of the matrix:
![](_figs/M3.png)
Once these matrices are set, it is possible to estimate the parameters in our SEM, and to derive how good the specified model fits the data using matrix algebra and **Maximum-Likelihood** estimation. We omit how these steps are performed here. If you are interested in understanding the details behind this step, you can have a look at @cheung2015meta, @mcardle1984some, or this [blog post](https://towardsdatascience.com/probability-concepts-explained-maximum-likelihood-estimation-c7b4342fdbb1).
### Meta-Analysis from a SEM perspective
We will now combine our knowledge about [meta-analysis models](#random) and SEM to formulate the random/fixed-effect model as a structural equation model [@cheung2008model].
To begin, let us return to the formula of the **random-effects model** first. Previously, we already described that the random-effects model follows a [multilevel structure](#mlma), which looks like this:
**Level 1:**
$$
\hat\theta_k = \theta_k + \epsilon_k
$$
**Level 2:**
$$
\theta_k = \mu + \zeta_k
$$
On the first level, we assume that the effect size $\hat{\theta}_k$ we observe for some study $k$ in our meta-analysis is an estimator for the true effect size of $k$, $\theta_k$. The observed effect size $\hat{\theta}_k$ deviates from the true effect size $\theta_k$ because of the sampling error $\epsilon_k$, the variance of which is $Var(\epsilon_k)=v_i$.
In a random-effects model, we assume that even the true effect size for $k$ is only drawn from a "super-population" of true effect sizes at level 2. The mean of this "super-population" $\mu$ is what we want to estimate as the pooled effect in a random-effects model, along with the variance of the "super-population", $Var(\zeta_k) = \tau^2$: the [between-study heterogeneity](#heterogeneity). The fixed-effect model is only a special case of the random-effects model where we assume that $\tau^2$ is zero.
It is quite straightforward to represent this model as a SEM graph if we use the variables on level 1 as **latent variables** to "explain" how the effect sizes we observed came into being [@cheung2015meta]:
![](_figs/REM_SEM.png)
In this graphical model, it becomes clear that the observed effect size $\hat{\theta}_k$ in the $k$th study is "influenced" by two arms: by the sampling error $\epsilon_k$ with variance $v_k$, and the true effect size $\theta_k$ with variance $\tau^2$.
### The Two-Stage Meta-Analytic SEM approach
Above, we defined the (random-effects) meta-analysis model from a SEM perspective. Although this is interesting from a theoretical standpoint, the model above is still not more or less capable than the meta-analysis techniques we learned before: it simply pools effect sizes assuming a random-effects model.
To really use the versatility of meta-analytic SEM, an approach involving two steps is required [@tang2016testing; @cheung2015meta]. In **Two-Stage Structural Equation Modeling** (**TSSEM**), we first **pool the effect sizes** from each study (usually correlations between variables we want to use for modeling). This allows for evaluating the heterogeneity of the pooled effect sizes, and if a random-effects model or subgroup analyses should be used. Using the maximum-likelihood-based approach used by the `metaSEM` package we will introduce in the following, even studies with **missing data** can be included.
In the second step, **weighted least squares** (WLS) estimation is used to fit the structural equation model we specified. The function for the specified model $\mathbf{\rho}(\mathbf{\theta})$ is [@cheung2009two; @cheung2015meta]:
$$ F_{WLS}(\mathbf{\theta}) = (\mathbf{r} - \mathbf{\rho}(\mathbf{\theta}))^\top \mathbf{V}^{-1} (\mathbf{r} - \mathbf{\rho}(\mathbf{\theta}))$$
Where $\mathbf{r}$ is a transformation of the pooled correlation matrix. The important part in this formula is $\mathbf{V}^{-1}$, the matrix containing the covariances of $\mathbf{r}$, from which the inverse is taken. This approach is quite similar to the [inverse-variance](#fixed) principle (see Chapter 4) traditionally used in meta-analysis. It gives effects with lower variance (i.e., greater precision/$N$) a larger weight in the estimation process. This is also a good way to account for the uncertainty in our estimates which may have been introduced by missing data. Importantly, the formula for this second step is the same whether we assume a random or fixed-effect model, because the between study-heterogeneity, if existant, is already taken care of in step 1.
## Multivariate Meta-Analysis
![](_figs/MVMA.jpg)
```{block, type='rmdinfo'}
Now it is time to delve into our first worked example of a meta-analytic SEM using *R*. We will begin by using the SEM-based approach for **multivariate meta-analysis**, which has not been covered before. In multivariate meta-analyses, each study contributes more than just one effect size at the same time. This may be helpful in cases where we are studying a problem or condition for which there are several main outcomes, not just one. For example, for some type of treatment, there could be two types of outcomes which are deemed as important in the literature, and are thus assessed in most studies [@schwarzer2015meta]. In multivariate meta-analyses, we can estimate the effect sizes for both outcomes simultaneously in one model. Taking the correlation between the two outcomes into account, we can also determine if **studies with a high effect size on one outcome also have higher effect sizes on the other outcome**, or if there is no, or a negative relationship.
```
It is of note here that multivariate meta-analysis can also be performed outside a SEM framework [@schwarzer2015meta]. Nevertheless, as an introduction, we will to show you how multivariate meta-analysis can be performed from a SEM perspective. In this and the following examples, we will work with `metaSEM`, a magnificent package for meta-analytic SEM developed by Mike Cheung [@cheungpackage]. To begin, as always, install the `metaSEM` package and load it from your library.
```{r, message=FALSE, warning=FALSE}
library(metaSEM)
```
For my multivariate meta-analysis, i will use the `dat.da` dataset. This is a fictitious dataset (adapted from the `wvs94a` data in `metaSEM`) containing $k=37$ studies examining the effects of psychotherapy on symptoms of Depression and Anxiety, expressed as the Standardized Mean Difference (SMD). The dataset can be downloaded [here](https://github.com/MathiasHarrer/Doing-Meta-Analysis-in-R/blob/master/dat_da.rda).
**Let's have a look at the data first.**
```{r, echo=FALSE}
load("_data/dat_da.rda")
dat.da = as.data.frame(dat.da)
for (i in 2:5){
dat.da[,i] = as.numeric(dat.da[,i])
}
```
```{r, chunk dat.da}
dat.da
```
As we can see, the dataset contains the effect sizes on both Depression and Anxiety, along with the variances $v_k$ for both effect sizes. There is also a column called `Covariance` in which the covariance between Depression and Anxiety reported in each study is stored.
A common problem is that the **covariance or correlation** between two outcomes is **not reported in original studies**. If this is the case, we have to estimate the covariance using a reasonable assumption on the correlation between the outcomes. Let us assume that we would not know the covariance in each study yet. How can we estimate it? A good way is to look for previous literature reporting the correlation between the two outcomes, optimally in the same kind of context we are synthesizing. Let us say we found in the literature that Depression and Anxiety are highly correlated in post-tests of psychotherapy trials, with $r_{D,A} \approx 0.6$. We could then approximate the covariance for each study $k$ by using $cov_k = SD_{D_{k}} \times SD_{A_{k}} \times r_{D,A}$. We can do this in *R* like this:
```{r}
estimated.cov <- sqrt(dat.da$Depression_var) * sqrt(dat.da$Anxiety_var) * 0.6
```
Where we take the square root because $SD = \sqrt{Var}$.
### Specifying the Model
To specify the model for the multivariate meta-analysis, we do not have to follow the TSSEM procedure programmatically, nor do we have to specify the RAM matrices. For such a relatively simple model, we can use the `meta` function in `metaSEM` to pool the studies, specify and fit the model all at once.
**Here are the most important parameters we have to specify for the `meta()` function.**
```{r,echo=FALSE, warning=FALSE, message=FALSE}
library(knitr)
Code<-c("y", "v", "data")
Description<-c("The columns of our dataset which contain the effect size data. In a multivariate meta-analysis, we have to combine the effect size columns we want to include using cbind().",
"The columns of our dataset which contain the variance for the effect size. In a multivariate meta-analysis, we have to combine the variance columns we want to include using cbind(). We also have to include the column including the covariance between the effect sizes. The structure of the cbind() function is cbind(variance_1, covariance, variance_2)",
"The dataset in which the effect sizes and their variances are stored.")
m<-data.frame(Code, Description)
names<-c("Parameter","Description")
colnames(m)<-names
kable(m, escape = FALSE)
set.seed(123)
```
I will save my meta-analysis model as `m.mv`. My code then looks like this:
```{r, chunk meta, eval=FALSE}
m.mv <- meta(y = cbind(Depression, Anxiety),
v = cbind(Depression_var, Covariance, Anxiety_var),
data = dat.da)
```
```{r, echo=FALSE}
load("_data/m.mv.rda")
```
To get the results, i have to plug the `m.mv` object into the `summary()` function.
```{r, eval=FALSE}
summary(m.mv)
```
```
Call:
meta(y = cbind(Depression, Anxiety), v = cbind(Depression_var,
Covariance, Anxiety_var), data = dat.da)
95% confidence intervals: z statistic approximation
Coefficients:
Estimate Std.Error lbound ubound z value Pr(>|z|)
Intercept1 0.50321391 0.01466266 0.47447563 0.53195220 34.3194 < 2.2e-16 ***
Intercept2 0.45553567 0.01748527 0.42126518 0.48980617 26.0525 < 2.2e-16 ***
Tau2_1_1 0.00459325 0.00186286 0.00094210 0.00824439 2.4657 0.013675 *
Tau2_2_1 0.00287809 0.00167184 -0.00039865 0.00615483 1.7215 0.085158 .
Tau2_2_2 0.00783290 0.00254639 0.00284206 0.01282374 3.0761 0.002097 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Q statistic on the homogeneity of effect sizes: 223.8427
Degrees of freedom of the Q statistic: 72
P value of the Q statistic: 0
Heterogeneity indices (based on the estimated Tau2):
Estimate
Intercept1: I2 (Q statistic) 0.6079
Intercept2: I2 (Q statistic) 0.7220
Number of studies (or clusters): 37
Number of observed statistics: 74
Number of estimated parameters: 5
Degrees of freedom: 69
-2 log likelihood: -140.7221
OpenMx status1: 0 ("0" or "1": The optimization is considered fine.
Other values may indicate problems.)
```
### Evaluating the Output
Given that the SEM model is fitted using the **Maximum-Likelihood** algorithm, the first thing we always do is check the `OpenMx status` right at the end of the output. Maximum-Likelihood is an **optimization procedure**, in which parameters are changed iteratively until the optimal solution for the data at hand is found. However, especially with more complex models, it can happen that this optimum is not reached even after many iterations; the Maximum-Likelihood algorithm will then stop and output the parameters values it has approximated so far. However, those values for our model components will then **be very likely** to be **incorrect** and should **not be trusted**. The `OpenMx status` for my model is `0`, which indicates that the Maximum-Likelihood estimation worked fine. However, if the status would have been other than `0` or `1`, it would have been necessary for me to rerun the fitting process with this code:
```{r, eval=FALSE}
rerun(m.mv)
```
In the output, the two pooled effect sizes are shown as `Intercept1` and `Intercept2`. The effect sizes are numbered be the order in which we inserted them into our call to `meta()`. We can see that the pooled effect sizes are $SMD_{D} = 0.50$ for Depression and $SMD_{A} = 0.46$. Both effect sizes are significant. Under `Heterogeneity indices`, we can also see the values of $I^2$ (see [Chapter 6](#heterogeneity)), which are $I^{2}_D = 61\%$ and $I^{2}_A = 72\%$, indicating substantial between-study heterogeneity.
The direct estimates of the between study heterogeneity $\tau^2$ are also provided. We see that there are not only two estimates, but three. To understand what this means, we can extract the `"random"` values from the `m.mv` object first.
```{r, last}
tau.coefs <- coef(m.mv, select = "random")
```
Then, we can use the `vec2symMat()` function on the coefficients to create a matrix. We then give the matrix rows and columns the names of our variables, Depression and Anxiety
```{r}
tc.mat <- metaSEM::vec2symMat(tau.coefs)
dimnames(tc.mat)[[1]] <- dimnames(tc.mat)[[2]] <- c("Depression", "Anxiety")
tc.mat
```
We now understand better what the three $\tau^2$ values mean: they represent the **between study "variance"/heterogeneity** at the diagonal of the matrix; in the other two fields, they are the estimate for the covariance between Depression and Anxiety. Given that the covariance is just an unstandardized version of the **correlation**, we can also calculate the correlations using the `cov2cor()` function.
```{r}
cov2cor(tc.mat)
```
We see that, quite logically, the correlations in the diagonal elements of the matrix are 1. The correlation between effect sizes on Depression and effect sizes on Anxiety is $r_{D,A} = 0.48$. This is an interesting finding: is shows that there is a **positive association between a treatment's effect on Depression and its effect on Anxiety**. Treatments which have high effects on Depression are very likely to have high effects on Anxiety too.
It is of note that the **confidence intervals** presented in the output are so-called **Wald-type** intervals. Such Wald-type intervals have been found to be prone to inacurateness, especially in small samples [@diciccio1996bootstrap]. It may thus be valuable to constuct confidence intervals in another way, using so-called **likelihood-based** confidence intervals. We can get these CIs by rerunning the `meta()` function and additionally specifying `intervals.type = "LB"`
```{r, eval=FALSE}
m.mv <- meta(y = cbind(Depression, Anxiety),
v = cbind(Depression_var, Covariance, Anxiety_var),
data = dat.da,
intervals.type = "LB")
```
We have already seen that the output for our `m.mv` contains estimates of the **between-study heterogeneity** $\tau^2$. If we recall our knowledge of the meta-analysis model, we can therefore conclude that the model we just fitted is a **random-effects model**. The `meta()` function uses a random-effects model automatically, because the fixed-effects model is only a special case of the random-effects model in which we set $\tau^2 = 0$. From the $I^2$ in our output, we can conclude that the random-effects model is indeed indicated; however, if we wanted to fit a fixed-effect model, we can redo the analysis, this time adding the parameter `RE.constraints = matrix(0, nrow=2, ncol=2)`, which creates a matrix of zeros of the same structure as the `tc.mat` matrix, constraining our `tau2` values to zero:
```{r, eval=FALSE}
m.mv <- meta(y = cbind(Depression, Anxiety),
v = cbind(Depression_var, Covariance, Anxiety_var),
data = dat.da,
RE.constraints = matrix(0, nrow=2, ncol=2))
```
### Plotting the Model
To plot the multivariate meta-analysis model, we can simply use the `plot()` function. I will make some additional specifications here to change the appearance of the plot slightly. If you want to see all styling options, you can paste `?metaSEM::plot.meta()` into the Console and then hit Enter.
```{r, fig.height=7, fig.width=7, fig.align="center", eval=FALSE}
plot(m.mv,
axis.labels = c("Depression", "Anxiety"),
randeff.ellipse.col = "black",
univariate.arrows.col = "black",
univariate.polygon.col = "blue")
```
```{r, fig.height=7, fig.width=7, fig.align="center", echo=FALSE}
metaSEM::plot.meta(m.mv,
axis.labels = c("Depression", "Anxiety"),
randeff.ellipse.col = "black",
univariate.arrows.col = "black",
univariate.polygon.col = "blue")
```
**Let us go through the plot one by one:**
* We see that the plot has two axes: the x-axis, displaying the **effect size on Depression**, and the y-axis displaying the **effect size on Anxiety**.
* We also see the pooled effect and its 95% CI for both outcomes, symbolized by the blue diamond and the black arrows.
* In the middle of the plot, the pooled effect for both variables is represented by another blue diamond.
* The red ellipse represents the **95% confidence interval* ellipse** for out pooled effect sizes.
* The black ellipse represents the space in which we expect 95% of all studies to fall based on the random-effects model.
* The black dots represent the **individual studies**, and the dashed lines are the 95% CIs of the individual studies.
$$~$$
## Confirmatory Factor Analysis
![](_figs/sem_tree.jpg)
```{block, type='rmdinfo'}
**Confirmatory Factor Analysis** (**CFA**) is a popular SEM method in which one specifies how observed variables relate to assumed latent variables [@thompson2004exploratory]. CFA is often used to evaluate the psychometric properties of questionnaires or other assessments. It allows researchers to determine if the variables they assess indeed measure one or more latent variables, and how these latent variables relate to each other. For frequently used questionnaires, there are often many studies which report the correlations between the different items of the questionnaires. Such data can be used for meta-analytic SEM, which allows us to **evaluate which latent variable (or factor) structure** is the **most appropriate** based on all available evidence.
```
In this example, let us assume that we want to confirm the latent factor structure of a **questionnaire for sleep problems**. The questionnaire is assumed to measure two distinct latent variables, which together characterize sleep problems: **insomnia** and **lassitude** (in fact @koffel2009two argue that sleep complaints do indeed follow this structure). For our questionnaire, we have created a fictitious dataset, `dat.cfa` (based on the `Digman97` data in the `metaSEM` package) which contains 14 independent studies which report the correlations between the different symptoms of sleep complaints that our questionnaire measures: **sleep quality**, **sleep latency**, **sleep efficiency**, **daytime dysfunction** and **hypersomnia** (i.e., sleeping too much; the dataset can be downloaded [here](https://github.com/MathiasHarrer/Doing-Meta-Analysis-in-R/blob/master/dat_da.rda)). We assume that the first three symptoms are related, because they all measure insomnia as a latent variable, whereas daytime dysfunction and hypersomnia are related because they are symptoms of the lassitude factor.
**The proposed structure represented as a graphical model looks like this** (please note that the labels are somewhat idiosyncratic to make identifying the relevant components of the model easier later on):
![](_figs/CFA_Graph.png)
### Setting up our data {#matrix_structure}
Let us first have a look at the data we want to use for model fitting. The `dat.cfa` dataset i will use here has a **special structure**: it is a so-called `list`, containing (1) another `list` of matrices and (2) a vector. A list is a very versatile *R* object, which allows to bind together different objects in one single big object. Lists can be accessed like `data.frame`s by using the `$` operator. By using the `names()` function, we can see the names of the objects in the list.
```{r, echo=FALSE}
load("_data/dat_cfa.rda")
```
```{r}
names(dat.cfa)
```
We see that the list contains two elements, our `data` and the `n` (sample size) of each study. The `data` object is itself a list, so we can get the names of its contents using the `names()` function, or can display single contents of it through the `$` operator.
```{r}
names(dat.cfa$data)
dat.cfa$data$`Coleman et al. (2003)`
```
We see that the `data` list contains 14 elements for each of the 14 included studies. A closer look at the `Coleman et al. (2003)` study reveals that the data are stored as **correlation matrices** for our five observed variables. The Coleman et al. (2003) study contains correlation data for all variable combinations; however we can also allow for some studies to have missings (coded as `NA`) on some of the fields **because the meta-analytic SEM approach can handle missing data** at least to some extent.
Before we proceed, let us quickly show how you can construct such a list yourself. Let us assume that you have the correlation matrices stored in separate `data.frame`s, which you [imported into R](#importing) (Chapter 3). The important part here is that the **column names**, and their **order** should already be **the same** in all of the `data.frames`. Let us say i have two data frames containing the correlation data, `df1` and `df2`, which look like this:
```{r, echo=FALSE}
df1 <- as.data.frame(dat.cfa$data$`Coleman et al. (2003)`)
rownames(df1) <- 1:5
df2 <- as.data.frame(dat.cfa$data$`Salazar et al. (2008)`)
rownames(df2) <- 1:5
```
```{r}
df1
df2
```
I can transform these `data.frames` into a matrix using the `as.matrix()` function. Because we want the rows and columns to contain the names of the variables, i have to specify the **dimension names** (`dimnames`) of the matrices. The first dimension represents the rows, the second the columns. I can do this in *R* like this:
```{r, collapse=TRUE}
# Convert to matrices
mat1 <- as.matrix(df1)
mat2 <- as.matrix(df2)
# Set the dimension names
dimnames(mat1)[[1]] <- dimnames(mat1)[[2]] <- c("Quality", "Latency", "Efficiency", "DTDysf", "HypSomnia")
dimnames(mat2)[[1]] <- dimnames(mat2)[[2]] <- c("Quality", "Latency", "Efficiency", "DTDysf", "HypSomnia")
# Print the matrices
mat1
mat2
```
We can then bind these matrices together in a `list`, and then give the list elements a name using the `names()` function.
```{r}
matrix.list <- list(mat1, mat2)
names(matrix.list) <- c("Study1", "Study2")
```
To do the modeling, we also need the total sample size $N$ of each study. We only have to create a numeric vector which contains the sample sizes in the **same order** as the objects in our list. We can then create one big list, containing both the list with our correlation matrix data, and the sample sizes for each study, again using the `list()` function.
```{r}
n <- c(205, 830)
all.data <- list(matrix.list, n)
```
That is it! We can now proceed to specifying our model for the `dat.cfa` data.
### Model Specification
To specify our CFA model, we will have to use the **RAM specification** and the **TSSEM** procedure we mentioned [before](#sem_about) (Chapter 13.1). The `metaSEM` package directly follows the TSSEM approach; in fact, it even has two different functions for the two stages, `tssem1` and `tssem2`. The first pools our correlation matrices across all studies, and the second fits the proposed model to the data.
#### Stage 1 {#stage1}
At the first stage, we pool the correlation matrices using the `tssem1()` function. There are 3-4 important parameters we have to specify in the function.
```{r,echo=FALSE, warning=FALSE, message=FALSE}
load("_data/cfa1.rda")
library(knitr)
Code<-c("Cov", "n", "method", "RE.type")
Description<-c("A list of correlation matrices we want to pool. Please note that all correlation matrices in the list need to have an identical structure.",
"A numeric vector containing the sample sizes of each study, in the same order as the matrices contained in Cov.",
"This specifies if we want to use a fixed-effect model ('FEM') or random-effects model ('REM').",
"When a random-effects model is used, this specifies which of the random-effects should be estimated. The default is 'Symm', which estimates all tau-squared values, including the covariances. When set to 'Diag', only the diagonal elements of the random-effects matrix are estimated. This means that we assume that the random effects are independent. Although using 'Diag' is a simplification of our model, it is often preferable, because less parameters have to be estimated. This particularly makes sense when the number of variables is high and/or the number of studies is low.")
m<-data.frame(Code, Description)
names<-c("Parameter","Description")
colnames(m)<-names
kable(m, escape = FALSE)
set.seed(123)
```
For this step, i will assume a random-effects model, and use `RE.type = "Diag"`. I will save the model as `cfa1`, and then call the `summary()` function on it to retrieve the output. Here is the code for that:
```{r, eval=FALSE}
cfa1 <- tssem1(dat.cfa$data,
dat.cfa$n,
method="REM",
RE.type = "Diag")
summary(cfa1)
```
```
Call:
meta(y = ES, v = acovR, RE.constraints = Diag(paste0(RE.startvalues,
"*Tau2_", 1:no.es, "_", 1:no.es)), RE.lbound = RE.lbound,
I2 = I2, model.name = model.name, suppressWarnings = TRUE,
silent = silent, run = run)
95% confidence intervals: z statistic approximation
Coefficients:
Estimate Std.Error lbound ubound z value Pr(>|z|)
Intercept1 0.38971904 0.05429257 0.28330757 0.49613052 7.1781 7.068e-13 ***
Intercept2 0.43265876 0.04145136 0.35141559 0.51390194 10.4377 < 2.2e-16 ***
Intercept3 0.04945626 0.06071079 -0.06953470 0.16844722 0.8146 0.41529
Intercept4 0.09603706 0.04478711 0.00825593 0.18381818 2.1443 0.03201 *
Intercept5 0.42724237 0.03911621 0.35057601 0.50390873 10.9224 < 2.2e-16 ***
Intercept6 0.11929310 0.04106204 0.03881299 0.19977321 2.9052 0.00367 **
Intercept7 0.19292421 0.04757961 0.09966988 0.28617853 4.0548 5.018e-05 ***
Intercept8 0.22690159 0.03240893 0.16338126 0.29042192 7.0012 2.538e-12 ***
Intercept9 0.18105563 0.04258855 0.09758361 0.26452765 4.2513 2.126e-05 ***
Intercept10 0.43614968 0.03205961 0.37331400 0.49898536 13.6043 < 2.2e-16 ***
Tau2_1_1 0.03648373 0.01513055 0.00682838 0.06613907 2.4113 0.01590 *
Tau2_2_2 0.01963097 0.00859789 0.00277942 0.03648253 2.2832 0.02242 *
Tau2_3_3 0.04571437 0.01952285 0.00745030 0.08397845 2.3416 0.01920 *
Tau2_4_4 0.02236122 0.00995083 0.00285794 0.04186449 2.2472 0.02463 *
Tau2_5_5 0.01729073 0.00796404 0.00168149 0.03289996 2.1711 0.02992 *
Tau2_6_6 0.01815482 0.00895896 0.00059557 0.03571407 2.0264 0.04272 *
Tau2_7_7 0.02604880 0.01130265 0.00389601 0.04820160 2.3047 0.02119 *
Tau2_8_8 0.00988761 0.00513713 -0.00018098 0.01995620 1.9247 0.05426 .
Tau2_9_9 0.01988243 0.00895052 0.00233973 0.03742514 2.2214 0.02633 *
Tau2_10_10 0.01049222 0.00501578 0.00066148 0.02032297 2.0918 0.03645 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Q statistic on the homogeneity of effect sizes: 1220.333
Degrees of freedom of the Q statistic: 130
P value of the Q statistic: 0
Heterogeneity indices (based on the estimated Tau2):
Estimate
Intercept1: I2 (Q statistic) 0.9326
Intercept2: I2 (Q statistic) 0.8872
Intercept3: I2 (Q statistic) 0.9325
Intercept4: I2 (Q statistic) 0.8703
Intercept5: I2 (Q statistic) 0.8797
Intercept6: I2 (Q statistic) 0.8480
Intercept7: I2 (Q statistic) 0.8887
Intercept8: I2 (Q statistic) 0.7669
Intercept9: I2 (Q statistic) 0.8590
Intercept10: I2 (Q statistic) 0.8193
Number of studies (or clusters): 14
Number of observed statistics: 140
Number of estimated parameters: 20
Degrees of freedom: 120
-2 log likelihood: -112.4196
OpenMx status1: 0 ("0" or "1": The optimization is considered fine.
Other values may indicate problems.)
```
A look at the `OpenMx status1` shows the the model estimates are trustworthy. To make the results more easily digestable, we can extract the **fixed effects** (our estimated pooled correlations) using the `coef()` function. We then make a symmetrical matrix out of the coefficients using `vec2symMat()`, and add the dimension names for easier interpretation.
```{r}
# Extract the fixed coefficients (correlations)
fixed.coefs <- coef(cfa1, "fixed")
# Make a symmetric matrix
fc.mat <- vec2symMat(fixed.coefs, diag = FALSE)
# Label rows and columns
dimnames(fc.mat)[[1]] <- dimnames(fc.mat)[[2]] <- c("Quality", "Latency", "Efficiency", "DTDysf", "HypSomnia")
fc.mat
```
We can now see the **pooled correlation matrix** for our our variables. Looking back the model output, we also see that all correlation coefficients are **significant** ($p<0.05$), except one: the **correlation between sleep quality and daytime dysfunction** was not significant. From the perspective of our model, this makes sense, because we expect these variables to load on different factors. We also find that the $I^2$ values for the different estimates are very, very large ($76\% - 93\%$). We may therefore also have a look if a model fitted in different subclusters of studies might reduce the amount of heterogeneity we find, and if this translates to different SEM fits on stage 2.
#### Stage 2
After pooling the correlation matrices, it is now time to determine if our proposed factor model does indeed fit the data. To specify our model, we have to resort to the **RAM** formulation this time, and specify the $\mathbf{A}$, $\mathbf{S}$ and $\mathbf{F}$ matrices. To fill the fields, it is often easier to construct an empty matrix first. In the rows and columns, the matrices will not only contain the observed variables, but also the **latent variables** we want to estimate, `f_Insomnia` and `f_Lassitude`. Here is how we can create a zero matrix as a blueprint:
```{r}
# Create vector of column/row names
dims <- c("Quality", "Latency", "Efficiency", "DTDysf", "HypSomnia", "f_Insomnia", "f_Lassitude")
# Create 7x7 matrix of zeros
mat <- matrix(rep(0, 7*7), nrow = 7, ncol = 7)
# Label the rows and columns
dimnames(mat)[[1]] <- dimnames(mat)[[2]] <- dims
mat
```
$$~$$
**$\mathbf{A}$ Matrix**
In the $\mathbf{A}$ matrix, we specify the asymmetrical (i.e. single) arrows in our model. The logic, again, is that the arrow starts at the **column variable** and ends where the column meets with the entry of the **row variable**. All other fields which do not represent arrows are filled with `0`. We specify that an arrow has to be "estimated" by providing a character string. This character string starts with a **starting value** for the optimiztion procedure (usually somewhere between $0.1$ and $0.3$) followed by `*`. After the `*` symbol, we specify a **label** for the value. If two fields in the $\mathbf{A}$ matrix have the same label, this means that we assume that the fields have the **same value**. We assume a starting value of $0.3$ for all estimated arrows, and label the fields according to the graph from above. We can do this like this:
```{r}
A <- matrix(c(0, 0, 0, 0, 0, "0.3*Ins_Q", 0 ,
0, 0, 0, 0, 0, "0.3*Ins_L", 0 ,
0, 0, 0, 0, 0, "0.3*Ins_E", 0 ,
0, 0, 0, 0, 0, 0 , "0.3*Las_D",
0, 0, 0, 0, 0, 0 , "0.3*Las_H",
0, 0, 0, 0, 0, 0 , 0 ,
0, 0, 0, 0, 0, 0 , 0
), nrow = 7, ncol = 7, byrow=TRUE)
# Label columns and rows
dimnames(A)[[1]] <- dimnames(A)[[2]] <- dims
A
```
Looks good so far. The last step is to plug the `A` matrix into the `as.mxMatrix()` function to make it usable for the later step.
```{r}
A <- as.mxMatrix(A)
```
$$~$$
**$\mathbf{S}$ Matrix**
In the $\mathbf{S}$ matrix, we specify the variances we want to estimate. We want to estimate the variance for all observed variables, as well as the correlation between our latent variables. We set the correlation of our latent variables with themselves to be $1$. We use a starting value of $0.2$ for the variances in the observed variables, and $0.3$ for the correlations. Here is how we construct the matrix:
```{r}
# Make a diagonal matrix for the variances
Vars <- Diag(c("0.2*var_Q", "0.2*var_L", "0.2*var_E", "0.2*var_D", "0.2*var_H"))
# Make the matrix for the latent variables
Cors <- matrix(c(1, "0.3*cor_InsLas",
"0.3*cor_InsLas", 1),
nrow=2, ncol=2)
# Combine
S <- bdiagMat(list(Vars, Cors))
# Label columns and rows
dimnames(S)[[1]] <- dimnames(S)[[2]] <- dims
S
```
And again, we transform the matrix using `as.mxMatrix()`.
```{r}
S <- as.mxMatrix(S)
```
$$~$$
**$\mathbf{F}$ Matrix**
The $\mathbf{F}$ matrix is quite easily specified: in the diagonal elements of **observed variables**, we fill in `1`. Elsewhere we specify `0`. In the $\mathbf{F}$ matrix, we **only select the rows in which at least on element is not zero**.
```{r}
# Construct diagonal matrix
F <- Diag(c(1, 1, 1, 1, 1, 0, 0))
# Only select non-null rows
F <- F[1:5,]
# Specify row and column labels
dimnames(F)[[1]] <- dims[1:5]
dimnames(F)[[2]] <- dims
F
```
```{r}
F <- as.mxMatrix(F)
```
$$~$$
**Model fitting**
Now, it is time to fit our proposed model to the data. To do this, we use the `tssem2()` function. We only have to provide the **stage 1 model** `cfa1`, the three matrices, and specify `diag.constraints=FALSE`, because we are not fitting a mediation model. I save the resulting object as `cfa2` and then access it using `summary()`.
```{r}
cfa2 <- tssem2(cfa1,
Amatrix = A,
Smatrix = S,
Fmatrix = F,
diag.constraints = FALSE)
summary(cfa2)
```
We again see that the `OpenMx status1` is `0`, meaning that the optimization worked fine. We get the estimates for the different paths between the latent variables and the observed symptoms, such as $0.68$ for $Lassitude \rightarrow Daytime Dysfunction$ (`Las_D`). The important part, however, it to check **how well the assumed model fits our data**. We can have a look at the second, third and fourth row of the `Goodness-of-fit indices`, where we see that the **goodness of fit** test is $\chi^{2}_{4,4496} = 7.82$, which is **not significant** $p=0.098$. On the other hand, we see that the **Root Mean Square Error of Approximation** (**RMSEA**; @steiger1980statistically) value is $RMSEA=0.0146$. As a rule of thumb, a model can be considered to fit the data well when the $RMSEA$ is close to $0.05$ and smaller than $0.10$. The indices for this model are therefore somewhat **conflicting**, but generally indicate that the model may **not fit all of the data closely**. A potential way to explore this would be to conduct a **subgroup analysis**, in which the model is fitted to subgroups of studies which share some kind of characteristic (e.g. age group, country of origin, etc.) to see if there might be subclusters in which the model fits better than others.
```{block, type='rmdachtung'}
Please be aware that a common problem in SEM studies is that researchers often only focus on their **one proposed model**, and if it fits the data well. If it is found that the model has a close fit for the data, it is often directly assumed that the data prove the underlying structure or theory. This is **problematic**, because for the same data, more than one model can achieve a good fit at the same time. It is therefore necessary to also check for **alternative model hypotheses** and structures. If the alternative model fits the data well too, it becomes less clear if our proposed structure really is the "correct" one.
```
#### Plotting the Model
After we fit the model, `metaSEM` makes it quite easy for us to visualize it graphically. To do this, we first have to install and load the `semPlot` package [@semplot].
```{r, message=FALSE}
library(semPlot)
```
To plot the model, we have to **convert** it into a format that `semPlot` can use. We can use the `meta2semPlot()` function to do this.
```{r}
cfa.plot <- meta2semPlot(cfa2)
```
We can then use the `semPaths()` function in `semPlot` to generate the plot. This function has many parameters, which you can access by typing `?semPaths` into the Console, and then hitting Enter. Here is how my code looks like, and the resulting plot:
```{r}
# Create Plot labels (left to right, bottom to top)
labels <- c("Sleep\nQuality","Sleep\nLatency","Sleep\nEfficiency","Daytime\nDysfunction",
"Hyper-\nsomnia","Insomnia", "Lassitude")
# Plot
semPaths(cfa.plot,
whatLabels = "est",
edge.color = "black",
nodeLabels = labels)
```
## Mediation
![](_figs/mediation_title.jpg)
```{block, type='rmdinfo'}
In mediation models [@baron1986moderator], we want to examine if a **direct effect** from one variable to another is **mediated** by an **intervening** or **mediator variable**. If a total mediation exists, we assume that a variable $X$ has an effect on a variable $Y$ only because $X$ influences a mediating variable $Z$, which itself affects $Y$.
Mediation is used in many fields, especially when we are interested in the **mechanisms** behind some relationship between two variables. However, it is important to mention that mediation models should always be based on a **theoretically** and **logically sound rationale** why the some variable $Z$ is mediating the relationship between $X$ and $Y$.
```
Using meta-analytic SEM, we can synthesize evidence from original studies to determine if a proposed mediation is indeed backed by all available evidence. In the following, we will show you an example of how this can be done in *R* using the `metaSEM` package.
### Model Specification
For this example, let us assume we want to disentangle why there is a relationship between (low) **psychological resilience** (see @fletcher2013psychological for a definition of this concept) and elevated levels of **depressive symptoms**. Based on the literature, we assume that there are two mediating variables: **emotion regulation** capabilities and **dysfunctional coping styles**. Both mediating variables are influenced by resilience, while low emotion regulation capabilities also influence dysfunctional coping. Both emotion regulation and coping style then influence the level of depressive symptoms a person experiences. One may represent the proposed model graphically like this (again using idiosyncratic notation to facilitate the model specification later on):
![](_figs/MED_SEM.png)
For this example, we ill use the fictitious `dat.med` dataset, which was adapted from the `Hunter83` dataset in `metaSEM`. The data can be downloaded [here](https://github.com/MathiasHarrer/Doing-Meta-Analysis-in-R/blob/master/dat_med.rda). The dataset is again a list containing (1) 14 correlation matrices for resilience, emotion regulation, dysfunctional coping and depressive symptoms extracted from 14 independent studies and (2) the $N$ of each study (see [previous chapter](#matrix_structure) for more details on the dataset structure required). Let us have a look at the matrix for the fictitious Devegvar et al. (1992) study:
```{r, echo=FALSE}
load("_data/dat_med.rda")
```
```{r}
dat.med$data$`Devegvar et al. (1992)`
```
We see that this study has some **missings**, because the variable **Resilience** was not assessed. To see the overall missing data pattern, we can use the `pattern.na()` function.
```{r}
pattern.na(dat.med$data)
```
We see that the correlation $r_{EmotReg,Coping}$ has the most missings in our data, with four studies not providing data for it. Given that we have $k=14$ studies overall, this amount of missing data may be acceptable. However, we have to check if the matrices are **positive definite**, because this is a requirement for the later processing steps. We can do this with the `is.pd()` function.
```{r}
is.pd(dat.med$data)
```
We get `TRUE` for all studies, so everything is fine and we can continue.
### Stage 1
Now let us proceed to **pooling the correlation matrices in the first step**. For a more detailed description of this step, please refer to the [previous subchapter](#stage1). This time, we use a fixed-effects model.
```{r, chunk-tssem-med, echo=FALSE}
set.seed(13)
```
```{r}
med1 <- tssem1(dat.med$data,
dat.med$n,
method = "FEM")
summary(med1)
```
The optimization status is `0`, so the estimates are trustworthy. If you do not get a status of `0` or `1`, plug the model into the `rerun()` function to try fitting it again.
### Stage 2
Now that we have the pooled correlation matrix available in `med1`, we can proceed by specifying our proposed mediation model. Again, we specify the $\mathbf{A}$ and $\mathbf{S}$ matrices. The $\mathbf{F}$ is not specified, because all of the variables in are model are **observed** (i.e., there are no latent variables). We will omit the details behind the matrix specification here; for more details please refer to the [first subchapter](#sem_about) for the general structure of the matrices, and the [last subchapter](#matrices) on how the matrices are specified in *R*.
$$~$$
**$\mathbf{A}$ Matrix**
We use starting values of $0.2$.
```{r}
# Build matrix
A <- matrix(c(0 , 0 , 0 , 0,
"0.2*Res_EmR", 0 , 0 , 0,
"0.2*Res_Cop", "0.2*EmR_Cop", 0 , 0,
0 , "0.2*EmR_Dep", "0.2*Cop_Dep", 0),
ncol = 4, nrow=4, byrow=TRUE)
# Set column and row labels
dimnames(A)[[1]] <- dimnames(A)[[2]] <- c("Resilience", "EmotReg", "Coping", "Depression")
A
```
```{r}
A <- as.mxMatrix(A)
```
$$~$$
**$\mathbf{S}$ Matrix**
We use starting values of $0.1$.
```{r}
# Build matrix
S <- Diag(c(1, "0.1*ErrVarE", "0.1*ErrVarC", "0.1*ErrVarD"))
# Set column and row labels
dimnames(S)[[1]] <- dimnames(S)[[2]] <- c("Resilience", "EmotReg", "Coping", "Depression")
S
```
```{r}
S <- as.mxMatrix(S)
```
$$~$$
**Model Fitting**
We can now proceed to fitting the model. In a mediation model, we also want to **estimate the indirect effect** from resilience to depression, taking all mediation paths into account. To do this, we can simply add all the mediation paths together. In our model, this would look like this:
$$\beta_{indirect_{Res-Dep}} = (\beta_{Res-Cop} \times \beta_{Cop-Dep}) + (\beta_{Res-EmR} \times \beta_{EmR-Cop} \times \beta_{Cop-Dep}) + (\beta_{Res-EmR} \times \beta_{Emr-Dep}) $$
We can define this function in our model so that it provides us with 95% confidence intervals around the indirect effect if we use likelihood-based intervals. We can define this function as a `list` containing an `mxAlgebra` object in *R*. Here is the code, using the labels we defined in the $\mathbf{A}$ matrix above:
```{r, eval=FALSE}
list(indirectEffect = mxAlgebra(Res_Cop*Cop_Dep + Res_EmR*EmR_Cop*Cop_Dep + Res_EmR*EmR_Dep,
name="indirectEffect"))
```
We can use this code as the argument for the `mx.algebra` parameter in our call to `tssem2()`. Because this is a mediation model, we also have to specify `diag.constraints = TRUE`. Here is the code:
```{r, echo=FALSE}
load("_data/med1.rda")
```
```{r, message=FALSE, warning=FALSE, eval=FALSE}
med2 <- tssem2(med1,
Amatrix = A,
Smatrix = S,
intervals.type = "LB",
diag.constraints = TRUE,
mx.algebras = list(indirectEffect = mxAlgebra(Res_Cop*Cop_Dep +
Res_EmR*EmR_Cop*Cop_Dep +
Res_EmR*EmR_Dep,
name="indirectEffect")))
# Rerun
med2 <- rerun(med2)
summary(med2)
```
```
Call:
wls(Cov = coef.tssem1FEM(tssem1.obj), aCov = vcov.tssem1FEM(tssem1.obj),
n = sum(tssem1.obj$n), Amatrix = Amatrix, Smatrix = Smatrix,
Fmatrix = Fmatrix, diag.constraints = diag.constraints, cor.analysis = tssem1.obj$cor.analysis,
intervals.type = intervals.type, mx.algebras = mx.algebras,
model.name = model.name, suppressWarnings = suppressWarnings,
silent = silent, run = run)
95% confidence intervals: Likelihood-based statistic
Coefficients:
Estimate Std.Error lbound ubound z value Pr(>|z|)
EmR_Cop 0.411161 NA 0.377782 0.444676 NA NA
Res_Cop 0.217913 NA 0.183661 0.252118 NA NA
Cop_Dep 0.131068 NA 0.091942 0.170226 NA NA
EmR_Dep 0.218838 NA 0.180424 0.257309 NA NA
Res_EmR 0.513365 NA 0.488406 0.538309 NA NA
ErrVarC 0.691468 NA 0.664472 0.717400 NA NA
ErrVarD 0.904928 NA 0.885399 0.922669 NA NA
ErrVarE 0.736457 NA 0.710223 0.761460 NA NA
mxAlgebras objects (and their 95% likelihood-based CIs):
lbound Estimate ubound
indirectEffect[1,1] 0.1498335 0.1685701 0.1878938
Goodness-of-fit indices:
Value
Sample size 3975.0000
Chi-square of target model 9.4087
DF of target model 1.0000
p value of target model 0.0022
Number of constraints imposed on "Smatrix" 3.0000
DF manually adjusted 0.0000
Chi-square of independence model 2697.6496
DF of independence model 6.0000
RMSEA 0.0460
RMSEA lower 95% CI 0.0226
RMSEA upper 95% CI 0.0748
SRMR 0.0161
TLI 0.9813
CFI 0.9969
AIC 7.4087
BIC 1.1209
OpenMx status1: 0 ("0" or "1": The optimization is considered fine.
Other values indicate problems.)
```
Because we told the `tssem2()` function to use **likelihood-based** intervals, the Wald-type $p$-values are not displayed. We see that the proposed model fits the data closely, with $\chi^{2}_{1,3975} = 9.4, p=0.002$) and the $RMSEA = 0.046$ being smaller than $0.05$. Please note however, that we used the **fixed-effect model** in stage 1 to pool the correlations, which may not be appropriate if the between-study heterogeneity is substantial. The estimate of the indirect effect from resilience to depression assuming our moderation model is $0.17$, which is significant ($95\%CI: 0.15-0.19$).
### Plotting the Model
Again, we can plot our model using the `semPaths()` function in the `semPlot` package.
```{r, echo=FALSE}
load("_data/med2.rda")
```
```{r, chunk3}
# Convert to semPlot
sem.plot <- meta2semPlot(med2)
# Create Labels (left to right, bottom to top)
labels <- c("Resilience","Emotion\nRegulation","Coping","Depres-\nsion")
# Plot
semPaths(sem.plot,
whatLabels = "est",
edge.color = "black",
layout="tree2",
rotation=2,
nodeLabels = labels)
```