forked from cjvanlissa/TCSM
-
Notifications
You must be signed in to change notification settings - Fork 0
/
08-Week3_class.Rmd
218 lines (145 loc) · 8.86 KB
/
08-Week3_class.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
# Week 3 - Class
This week, we will analyze the data from the European social survey, and the paper by Kestilä for the last time. Last week, you first replicated the results by Kestilä, and then ran your own set of factor analyses. Hopefully you experienced yourself that it matters quite a bit what type of factor analysis method you choose; both for the interpretation of your factors, and analyses that incorporate factor scores.
Instead of doing Exploratory Factor Analysis, another way of analyzing the data from the European Social Survey would be to use Confirmatory Factor Analysis. During this practical, you will conduct a CFA and compare your results to earlier EFA and PCA results and the article by Kestilä.
### Question 1
Load the ESS data into R, into an object called `df`. The data are in the file `"week3_class.csv"`. You can load it into R using the syntax:
```{r, echo = FALSE, eval = TRUE}
if(FALSE){
load("essround1-c.RData")
data <- essround1_c
data[c(7:44, 48, 49, 50)] <- lapply(data[c(7:44, 48, 49, 50)], as.numeric)
data$yrbrn <- as.numeric(as.character(data$yrbrn))
df <- data[data$cntry %in% c("Austria", "Belgium", "Denmark", "Finland", "Germany", "Italy", "Netherlands", "Norway", "Sweden"), ]
df$cntry <- factor(df$cntry)
df <- df[, -c(which(names(data) == "filter_."):which(names(data)=="immieco"))]
write.csv(df, "TCSM_student/week3_class.csv", row.names = FALSE)
}
df <- read.csv("TCSM_student/week3_class.csv", stringsAsFactors = TRUE)
```
```{r, eval = FALSE}
df <- read.csv("week3_class.csv", stringsAsFactors = TRUE)
```
Furthermore, you will have to load the package `lavaan` before starting with the exercises.
<details>
<summary>Click for explanation</summary>
```{r, warning = FALSE, message = FALSE}
library(lavaan)
```
\details
Furthermore, we want to work with numeric variables instead of factors once again, and with the countries of interest only.
<details>
<summary>Click for explanation</summary>
```{r}
```
\details
### Question 2
First, review your EFA-results for the 'trust in politics' items, as well as the question wordings of the items. How many factors do you expect?
### Question 3
Build a CFA model for the trust in politics items by means of the R-package `lavaan`. A tutorial example is available here: http://lavaan.ugent.be/tutorial/cfa.html
Make sure to ask for model fit statistics. What do you find for the value of Chi-square, df, RMSEA and CFI? Any idea why you find this Chi-square value? Does the model fit the data?
<details>
<summary>Click for explanation</summary>
```{r, message = FALSE}
trust_model_3f <- 'trustpol =~ pltcare + pltinvt + trstplt
satcntry =~ stfeco + stfgov + stfdem + stfedu + stfhlth
trustinst =~ trstlgl + trstplc + trstun + trstprl'
fit_trust_model_3f <- cfa(trust_model_3f, data = df)
summary(fit_trust_model_3f, fit.measures = TRUE)
```
\details
### Question 4
As an alternative model, build a 1-factor model, with the same items as you used before, and one trust in politics factor. Evaluate the Chi-square, df, RMSEA and CFI again. Does the one factor-model fit better or worse than the factor model you previously estimated?
*Note, there is also a formal way to test whether a difference between two chi-square values is significant; more on that in the practicals of week 4.*
<details>
<summary>Click for explanation</summary>
```{r, message = FALSE}
trust_model_1f <- 'political_trust =~ pltcare + pltinvt + trstprl + trstplt + stfeco + stfgov + stfdem + stfedu + stfhlth + trstlgl + trstplc + trstun + trstep'
fit_trust_model_1f <- sem(trust_model_1f, data = df)
summary(fit_trust_model_1f, fit.measures = TRUE)
```
\details
### Question 5
Similarly, can you think of a 2-factor model that would explain political trust? Build this model as well, and compare Chi-square, df, RMSEA and CFI to both the 1-factor and the 3-factor model. Which of the models is the best in your opinion?
By now, you should be able to perform your own two-factor CFA, based on substantive grounds. If you do not know how to do this immediately, please have a look at question 3.
**Note: None of the models fit really well. In practice, this would mean that you would have to change the model. For now, stick with the best model you have.**
### Question 6
Choose your best model, and ask for the standardized estimates by means of the addition `standardized = TRUE` in the summary command. Which item is the best predictor of the first factor?
<details>
<summary>Click for explanation</summary>
```{r}
summary(fit_trust_model_3f, fit.measures = TRUE, standardized = TRUE)
standardizedsolution(fit_trust_model_3f)
```
\details
### Question 7
Byrne (2005) states that under certain conditions, a second order CFA can be specified. Would the political trust model qualify for a second order factor model?
### Question 8
Specify a second-order factor model and run this model. What do you conclude when you evaluate model fit? Is this model better than your model that you selected in question 6?
<details>
<summary>Click for explanation</summary>
To run a second-order factor model, you can simply add an additional line within the single quotes containing the factors that you want in the second-order factor model, like in the example below.
```{r, message = FALSE}
trust_model_3f2o <- 'trustpol =~ pltcare + pltinvt + trstprl + trstplt
satcntry =~ stfeco + stfgov + stfdem + stfedu + stfhlth
trustinst =~ trstlgl + trstplc + trstun + trstep
trust =~ trustpol + trustinst'
fit_trust_model_3f2o <- sem(trust_model_3f2o, data = df)
summary(fit_trust_model_3f2o, fit.measures = TRUE)
```
\details
### Question 9
Build a new one-factor model, only using the 3 items that ask about the respondents trust in institutions with 1) trust in the legal system, 2) trust in the police and 3) trust in the UN (see below). You can also take the full model you specified in either question 6 and question 9, but you might experience that the model becomes complicated due to the large amount of arrows.
After doing this, add as predictors of the latent factor: gender, age, education in years, political interest and self-placement on the left right scale.
![](week3class1.png)
Estimate the model. The model doesn't fit very well, but for now, we will stick with this model. Write down the regression coefficients (standardized and unstandardized) and relevant test statistics.
*Hint: Once you add predictions to your model, you should use* `sem()` *instead of* `cfa()`.
<details>
<summary>Click for explanation</summary>
We will first have to do some recoding again. You might want to make an age variable instead of a yearborn variable, which eases the interpretation. Furthermore, a dummy for gender is more informative than just the variable `gndr`, although this will be treated as a dummy. You will also have to recode the variables `eduyrs` and `lrscale` to numerical variables, and you will have to make a two-category dummy for the variable `polintr`.
```{r}
df$age <- 2002 - df$yrbrn
model_q9 <- 'trustinst =~ trstlgl + trstplc + trstun
trustinst ~ gndr + age + eduyrs + polintr + lrscale'
fit_model_q9 <- sem(model_q9, data = df)
summary(fit_model_q9, fit.measures = TRUE, standardized = TRUE)
```
```{r, echo = FALSE, eval = FALSE}
df$age <- 2002 - as.numeric(as.character(df$yrbrn))
df$female <- as.numeric(df$gndr)
df$female <- as.numeric(df$female == 2)
df$female <- factor(df$female,
levels = c(0,1),
labels = c("Male",
"Female"))
df$eduyrs <- as.numeric(df$eduyrs)
df$polintr_dummy <- as.numeric(df$polintr)
df$polintr_dummy <- as.numeric(df$polintr_dummy <= 2)
df$polintr_dummy <- factor(df$polintr_dummy,
levels = c(0,1),
labels = c("Not or hardly interested",
"Quite or very interested"))
df$lrscale <- as.numeric(df$lrscale)
model_q9 <- 'trustinst =~ trstlgl + trstplc + trstun
trustinst ~ female + age + eduyrs + polintr_dummy + lrscale'
fit_model_q9 <- sem(model_q9, data = df)
summary(fit_model_q9, fit.measures = TRUE, standardized = TRUE)
```
\details
### Optional
You can plot the resulting model using `semPaths()`:
```{r}
library(semPlot)
semPaths(fit_model_q9, whatLabels = "est", rotation = 4)
```
### Question 10
Now, replace the latent score "trust in institutions" with the EFA factor score 'trust in institutions'. Delete the separate indicators, so you end up with the model below.
![](week3class2.png)
<details>
<summary>Click for explanation</summary>
```{r}
model_q10 <- 'trustinstEFA ~ gndr + age + eduyrs + polintr + lrscale'
fit_model_q10 <- sem(model_q10, data = df)
summary(fit_model_q10, fit.measures = TRUE, standardized = TRUE)
```
\details
Compare the results. Does it matter whether we use a CFA or EFA to predict trust in institutions?