-
Notifications
You must be signed in to change notification settings - Fork 16
/
cosponsor.Rmd
139 lines (116 loc) · 5.32 KB
/
cosponsor.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# Cosponsorship: computing auxiliary quantities from MCMC output {#cosponsorship}
```{r cosponsor_setup,cache=FALSE,message=FALSE}
library("tidyverse")
library("rstan")
library("rstanarm")
```
Typically, MCMC output consists of samples from the posterior density of model
parameters. But note that other quantities can be generated as well, say,
imputations for missing data points, predictions, residuals, or goodness-of-fit
summary statistics. In fact, any function of the parameters can be calculated
and output.
I demonstrate these ideas in the context of a generalized linear model for
binary data. The specific application is @Krehbiel1995a, a study of
legislative behavior. [^cosponsor-data] [^cosponsor-src] The response variable
is a binary indicator, which is equal to 1 if the member of the U.S. House of
Representatives chose to cosponsor bill [H.R.
3266](https://www.congress.gov/bill/103rd-congress/house-bill/3266), and 0
otherwise. Of the 434 legislators for which data is available, 228 legislators
cosponsored this bill. H.R.3266 was a wide-ranging spending bill designed to
circumvent the usual budget-making process, that was considered by the [103rd
House of
Representatives](https://en.wikipedia.org/wiki/103rd_United_States_Congress) in
1993--1994. Seven covariates are used in the analysis:
- liberalism as measured by the interest group Americans for Democratic Action (ADA)
- fiscal conservatism published by the National Taxpayers' Union (NTU)
- Democratic Party membership
- Congressional seniority, measured by years since first election
- the electoral margin of the member
- membership of the House Appropriations Committee
- membership of the House Budget Committee.
@Krehbiel1995a finds that, conditional on legislators' policy preferences, as measured with the ADA and NTU scores, Democrats were more likely to support H.R. 3266 than Republicans.
Seniority is also a key predictor; junior were members more likely to cosponsor this legislation, conditional on the other covariates.
## Model
$$
\begin{aligned}[t]
y_i &= \mathsf{Bernoulli}(\mu_i) \\
\mu_i &= \mathsf{Logit}^{-1}(x_i \beta)
\end{aligned}
$$
Several auxiliary quantities are estimated in the following Stan program:
the *percent correctly predicted (PCP)*,
$$
\mathrm{PCP} = \frac{1}{N} \sum_{i = 1}^N \left( y_i (\pi_i >= 0.5) + (1 - y_i) (\pi_i < 0.5) \right) ,
$$
and the expected percent correctly predicted (ePCP)* [@Herron1999a],
$$
\mathrm{PCP} = \frac{1}{N} \sum_{i = 1}^N \left( y_i \pi_i + (1 - y_i) (1 - \pi_i) \right) .
$$
Uncertainty in the parameters, and propagates to uncertainty in these quantities, and thus it is easy to calculate posterior distributions for these values.
```{r a2z}
data("a2z", package = "bayesjackman")
```
```{r cosponsor_formula}
cosponsor_formula <- cosp ~ ada + ntu + democrat + firstelected + margin + appromember + budgetmember
```
```{r cosponsor_data}
cosponsor_data <- within(list(), {
y <- a2z$cosp
N <- length(y)
X <- model.matrix(update(cosponsor_formula, . ~ 0 + .), data = a2z) %>% scale()
K <- ncol(X)
alpha_loc <- 0
alpha_scale <- 10
beta_loc <- rep(0, K)
beta_scale <- rep(2.5, K)
})
```
```{r cosposor_mod_logit2,results='hide',cache.extra=tools::md5sum("stan/logit2.stan")}
mod_logit2 <- stan_model("stan/logit2.stan")
```
```{r echo=FALSE,results='asis'}
mod_logit2
```
Sample from this model:
```{r results='hide'}
cosponsor_fit <- sampling(mod_logit2, data = cosponsor_data)
```
```{r}
summary(cosponsor_fit, par = c("beta", "alpha", "PCP", "ePCP"))$summary
```
This model can be fit using the **rstanarm** function `stan_glm`.
This is a binomial model, so it uses `family = binomial()`:
```{r cosponsor_fit2,results='hide'}
cosponsor_fit2 <- stan_glm(cosponsor_formula, data = a2z, family = binomial())
```
```{r}
cosponsor_fit2
```
- probability of co-sponsorship for a representative member (median x values)
```
## clarify calculations
## probability of co-sponsorship for "average" member (median x vals)
probit(pbar) <- beta[1] + beta[2]*0.55 + beta[3]*0.32
+ beta[4] + beta[5]*86 + beta[6]*0.26;
```
- party affiliation - holding other values at their representative values
```
## party affiliation, effect size
probit(p.dem) <- beta[1] + beta[2]*0.55 + beta[3]*0.32
+ beta[4] + beta[5]*86 + beta[6]*0.26;
probit(p.rep) <- beta[1] + beta[2]*0.55 + beta[3]*0.32
+ beta[5]*86 + beta[6]*0.26;
d.party <- p.dem - p.rep;
```
- party *attributable effect* from all sources
```
## "attributable effect", all sources, due to party affiliation
probit(p.dem.all) <- beta[1] + beta[2]*0.8 + beta[3]*0.2
+ beta[4] + beta[5]*86 + beta[6]*0.28;
probit(p.rep.all) <- beta[1] + beta[2]*0.1 + beta[3]*0.74
+ beta[5]*86 + beta[6]*0.24;
d.party.all <- p.dem.all - p.rep.all
```
TODO: Calculate latent residuals [@GelmanGoegebeurTuerlinckxEtAl2000a; @AlbertChib1995a]. Do they make sense and have any meaning outside of Gibbs sampling? They don't seem useful relative to LOO measures of outliers.
[^cosponsor-data]: Replication data from @Herron2010a.
[^cosponsor-src]: Example derived from Simon Jackman, "[Cosponsorship: computing auxiliary quantities from MCMC output](https://web-beta.archive.org/web/20040501194620/http://jackman.stanford.edu:80/mcmc/kk.odc)", 2004-05-01.