forked from cjvanlissa/TCSM
-
Notifications
You must be signed in to change notification settings - Fork 0
/
07-Week3_home.Rmd
159 lines (114 loc) · 5.35 KB
/
07-Week3_home.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# Week 3 - Home
Last week, you have worked on the data used by Kestilä in a paper that discussed two possible
reasons why there is no Radical Right party in Finland. You have attempted to 1) replicate her
study by doing a Principal Component Analysis and 2) a factor analysis (exploratory) of the same
data.
This week you also learned that it is possible to do Confirmatory Factor Analysis within the structural
equation modeling (SEM) framework. We use the R-package `lavaan` to fit these kinds of models.
Before we will analyze the Kestilä data, you first need to learn some of the basic principles of doing
analyses using `lavaan`. Using syntax, you need to tell `lavaan` exactly
what kind of model you want it to estimate This opens up many more possibilities to do Theory Construction and then
subsequently test your theory using Statistical Modeling.
As a preparation for the next practical, work your way through this tutorial (part of which consists of the official `lavaan` tutorial). You will find that `lavaan` is a very user-friendly software package.
### Get started with lavaan
To get started with lavaan, read and run the following two chapters of the official `lavaan` tutorial:
* [Installing lavaan](http://lavaan.ugent.be/tutorial/install.html)
* [Lavaan syntax](http://lavaan.ugent.be/tutorial/syntax1.html) (you just have to read this one)
### Regression models in lavaan
Download the data file *Hamilton.csv*, or *Hamilton.xls* [here](https://github.com/cjvanlissa/TCSM_student). The data are as follows:
Hamilton (1990) provided several measurements on each of 21 states. Three of the
measurements will be used in this tutorial:
1. Average SAT score
2. Per capita income expressed in $1,000 units
3. Median education for residents 25 years of age or older
Load the data from the .csv or .xls file into R.
*Hint: Use* `read.csv()` *or* `readxl::read_excel()`
<details>
<summary>Click for explanation</summary>
```{r, message=FALSE, eval=FALSE}
library(readxl)
df <- read_excel("Hamilton.xlsx", 1)
```
```{r, message=FALSE, echo=FALSE}
library(readxl)
df <- read_excel("TCSM_student/Hamilton.xlsx", 1)
```
Or
```{r, eval=FALSE}
df <- read.csv("Hamilton.csv")
```
\details
### Conceptual model
The following path diagram shows a model for these data:
![](AMOS_path.png)
This is a simple regression model where one observed variable, SAT, is predicted as a
linear combination of the other two observed variables, Education and Income. As with
nearly all empirical data, the prediction will not be perfect. The variable Other
represents variables other than Education and Income that affect SAT.
Each single-headed arrow represents a regression weight. The number 1 in the
figure specifies that Other must have a weight of 1 in the prediction of SAT. This constraint is imposed by default in `lavaan`.
### Lavaan syntax
Based on the `lavaan` tutorial, write down (just as text) the model syntax that describes the model in the picture. How many regressions are there? How many covariances?
<details>
<summary>Click for explanation</summary>
The syntax for this model is:
```{r, eval = FALSE}
"SAT ~ Income + Education
Income ~~ Education"
```
Or, equivalently:
```{r, eval = FALSE}
"SAT ~ Income
SAT ~ Education
Income ~~ Education"
```
This syntax specifies two regression equations and one covariance. However, three more parameters are included by lavaan per default:
1. The **residual** (unexplained) variance in SAT
2. The variance of Income
3. The variance of Education
So, strictly speaking, if you don't want to rely on the default settings, the syntax would be:
```{r, eval = FALSE}
"SAT ~ Income + Education
Income ~~ Education
SAT~~SAT
Income~~Income
Education~~Education"
```
\details
### Performing the analysis
In `lavaan`, models are fit using the `sem()` function. Run the command `?sem` to open the help file for this function. Try to figure out how to take the syntax you wrote for the previous question, and fit it to the Hamilton data.
<details>
<summary>Click for explanation</summary>
```{r}
# Load the lavaan package
library(lavaan)
# Fit the model to df, and store the result in an object called 'fit'
fit <- sem(model = "SAT ~ Income + Education
Income ~~ Education",
data = df)
```
This will result in a warning about the variances. You can ignore this.
\details
### Viewing the output
Most of the relevant output of a `lavaan` analysis can be extracted using the `summary()` function. Get a summary for the analysis now. Do either of the predictors have a significant effect on SA? By specifying the option `rsquare = TRUE` in the `summary()` function, you can additionally get squared multiple correlations for the dependent variables.
<details>
<summary>Click for explanation</summary>
```{r}
summary(fit, rsquare = TRUE)
```
\details
### Plotting the output
The package `semPlot` can automatically plot simple SEM models, like path models and CFA models. To visualize this SEM model, install the semPlot package, and use the function `semPaths`:
```{r, eval = FALSE}
install.packages("semPlot")
library(semPlot)
semPaths(fit)
```
```{r, echo = FALSE}
library(semPlot)
semPaths(fit)
```
The default plot can be improved upon, for example, by plotting the parameter estimates onto the paths, and rotating it to match our initial conceptual model at the start of this tutorial:
```{r}
semPaths(fit, whatLabels = "est", rotation = 2)
```