-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.Rmd
323 lines (213 loc) · 8.57 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
---
title: "Small network statistics for the network science of teams\\footnote{Contact: \\url{[email protected]}. We thank members of our MURI research team, USC's Center for Applied Network Analysis, Andrew Slaughter, and attendees of the NASN 2018 conference for their comments.}"
author:
- George G. Vega Yon, MS
- Kayla de la Haye, PhD
date: "NetSciX 2019, SCL \\linebreak[4]January 3, 2019"
output:
beamer_presentation:
slide_level: 2 # revealjs::revealjs_presentation
highlight: espresso
latex_engine: xelatex
includes:
in_header: notation-def.tex
aspectratio: 169
---
```{r setup, include=FALSE}
knitr::knit_hooks$set(smallsize = function(before, options, envir) {
if (before) {
"\\footnotesize\n\n"
} else {
"\n\\normalsize\n\n"
}
})
knitr::opts_chunk$set(echo = TRUE, smallsize=TRUE)
```
## Funding Acknowledgement
\begincols
\begincol{.2\linewidth}
\includegraphics[width=.8\linewidth]{fig/ARO_logo.png}
\endcol
\begincol{.79\linewidth}
This material is based upon work support by, or in part by, the U.S. Army Research
Laboratory and the U.S. Army Research Office under grant number W911NF-15-1-0577
\endcol
\endcols
\begincols
\begincol{.79\linewidth}
Computation for the work described in this paper was supported by the University
of Southern California’s Center for High-Performance Computing (hpc.usc.edu).
\endcol
\begincol{.2\linewidth}
\includegraphics[width = 1\linewidth]{fig/usc.pdf}
\endcol
\endcols
\begin{figure}
\centering
\includegraphics[width = .85\linewidth]{fig/muriteams.png}
\end{figure}
## Context: A tale about social abilities and team performance
Recruited \pause
- 42 mixed gender groups of 3 to 5 participants (unknown)
- Eligibility: (1) 18+ years, (2) Native English speaker \pause
2 hour group session \pause
- Group tasks (2 sets of tasks x 30 minutes each)
- Measure group social networks and individual social intelligence (SI) \pause
Study motivation \pause
- Overall, a very limited set of SI domains have been tested as predictors of social networks
- Very little research on the emergence of networks in teams.
## Context (cont'd)
\begin{figure}
\centering
\includegraphics[width = .65\linewidth]{fig/plot-graph-4-1.pdf}
\end{figure}
\pause
How can we go beyond descriptive statistics?
## Small networks and Exponential Random Graph Models
Exponential Random Graph Models: *What are the structures that give origin to a given observed graph?*
```{r ergm-terms, echo=FALSE, fig.align='center', out.width=".7\\linewidth"}
knitr::include_graphics("fig/ergm-terms.png")
```
(In general, ties are not IID, moreover, the entire graph is a single observation.)
## Small networks and Exponential Random Graph Models (Cont'd)
* Estimating ERGMs is a complex problem. \pause The likelihood function has
$2^{n(n-1)}$ terms, which means that for a graph of size 6 we have about
1 billion terms to compute.\pause
* Current approaches to estimate ERGMs rely on simulation methods, for example
MCMC-MLE\pause
When trying to estimate ERGMs in little networks \pause
* MCMC fails to converge when trying to estimate a block diagonal (structural
zeros) model,\pause
* Same happens when trying to estimate an ERGM for a single (little) graph, \pause
* Even if it converges, model degeneracy, *i.e.* bad fit, arises too often.
## Rethinking the problem
\pause
* 1st Step: Forget about MCMC-MLE estimation, take advantage of small
sample and use exact statistic for MLEs: \pause
$$
\Pr\left(\mathbf{Y}=\mathbf{y}|\theta, \mathcal{Y}\right) = \frac{\exp{\theta^{\mbox{T}}\mathbf{g}(\mathbf{y})}}{\kappa\left(\theta, \mathcal{Y}\right)},\quad\mathbf{y}\in\mathcal{Y}
$$
Where $\mathbf{g}(\mathbf{y})$ is a vector of sufficient statistics, $\theta \in \Theta$ a vector of model parameters, and $\kappa\left(\theta, \mathcal{Y}\right)$ is the normalizing constant (a summation with $2^{n(n-1)}$ terms)
* This solves the problem of been able to estimate a small ergm. \pause
* For this we started working on the `lergm` R package
(available at https://github.com/muriteams/lergm):
## Example 1
Let's start by trying to estimate an ERGM for a single graph of size 4
```{r lergm1, echo=TRUE}
library(lergm)
set.seed(12)
x <- sna::rgraph(4)
lergm(x ~ edges + balance + mutual)
```
----------
* Cool, we are able to estimate ERGMs for little networks! (we actually call
them ~~lergms~~ ERGMitos[^barnett]),\pause
* Going directly to MLE, we avoid the degeneracy problem.\pause
* Moreover, due to the size of the networks, we can actually go further and
estimate pooled ERGMs
[^barnett]: Thanks to George Barnett for suggesting the name!
## Solution
* When estimating a block diagonal ERGM we were essentially assuming
independence across networks.\pause
* This means that we can actually do the same with exact statistics approach
to calculate a joint likelihood:\pause
$$
\Pr\left(\mathbf{Y}=\{\mathbf{y}_{\color{cyan} i}\}|\theta, \left\{\mathcal{Y}_{\color{cyan}i}\right\}\right) = {\color{cyan} \prod_i} \frac{\exp{\theta^{\mbox{T}}\mathbf{g}(\mathbf{y}_{\color{cyan} i})}}{\kappa_{\color{cyan} i}\left(\theta, \mathcal{Y}_{\color{cyan}i}\right)}
$$
\pause
* By estimating a pooled version of the ERGM we can increase the power of our
MLEs.\pause
* We implemented this in the `lergm` package
## Example 2
Suppose that we have 3 little graphs of sizes 4, 5, and 5:
```{r lergm2, echo=TRUE}
library(lergm)
set.seed(12)
x1 <- sna::rgraph(4)
x2 <- sna::rgraph(5)
x3 <- sna::rgraph(5)
lergm(list(x1, x2, x3) ~ edges + balance + mutual)
```
<!-- ## Convergence diagnostics -->
<!-- ```{r diagnostics2} -->
<!-- plot(ans) -->
<!-- ``` -->
## Simulation study
### Scenario A
1. Draw parameters for _edges_ and _mutual_ from a uniform(-3, 3).
2. Using those parameters, sampled $n\sim\mbox{Poisson}(30)$ networks of size 4
3. Estimated the pooled ERGMs using both the MLE and the bootstrap version.
\pause
### Scenario B
1. Idem.
2. Using those parameters, sampled $n_1\sim\mbox{Poisson}(15), n_2\sim\mbox{Poisson}(15)$
networks of size 3 and 4 respectively.
3. Idem.
\pause
(If anyone asks, I just ran about 3 million ERGMs... :))
## Simulation study: Scenario A
### Empirical Bias
\begin{figure}
\centering
\includegraphics[width=.65\linewidth]{fig/bias-01-fixed-sizes-4.pdf}
\end{figure}
## Simulation study: Scenario A
### Power
\begin{figure}
\centering
\includegraphics[width=.65\linewidth]{fig/power-01-fixed-sizes-4.pdf}
\end{figure}
## Simulation study: Scenario B
### Empirical Bias
\begin{figure}
\centering
\includegraphics[width=.65\linewidth]{fig/bias-02-various-sizes-3-4.pdf}
\end{figure}
## Simulation study: Scenario B
### Power
\begin{figure}
\centering
\includegraphics[width=.65\linewidth]{fig/power-02-various-sizes-3-4.pdf}
\end{figure}
## Other approaches
\begin{figure}
\centering
\includegraphics[height=.7\textheight]{fig/similr.png}
\end{figure}
## Discussion
* First set of results from the simulation study are encouraging\pause
* Need to conduct more simulations using _nodal_ attributes and networks of size 5
(right now having problems when building the DGP).\pause
* Small structures imply a smaller pool of parameters (which is OK), but
can be more useful when including nodal attributes.\pause
* When estimating the pooled version, we are essentially hand-waving the fact
that parameter estimates implicitly encode size of the graph, i.e.
> Does a the estimate of `edge = 0.1` has the same meaning for a network of
size 3 to a size 5? (but perhaps is not such a big deal)
\pause
* Finally, this work can be extended to other types of small networks,
including: families, ego-nets, etc. And other methods, such as TERGMs.
##
### Thank you!
\maketitle
## What have we got so far?
\footnotesize
```r
lergm(networks ~ mutual + edges + triangle + nodematch("male") +
diff("Empathy") + nodematch("nonwhite"))
```
```{r muri-pre, cache=TRUE, echo=FALSE}
ans <- confint(readRDS("muri_estimates.rds"))
ans_less3 <- confint(readRDS("muri_estimates_less3.rds"))
library(magrittr)
# kableExtra::usepackage_latex("booktabs")
knitr::kable(
x = cbind(ans, ans_less3),
digits = 2,
format = "latex",
booktabs = TRUE,
caption = "Preliminary results with our small teams data. The table shows 95\\% confidence intervals for the parameter estimates using the pooled ERGM model."
) %>%
kableExtra::add_header_above(c(" " = 1, "All (42)" = 2, "All but size 3 (35)" = 2)) %>%
kableExtra::row_spec(c(2, 3, 5), background = "lightgray")
```