Skip to content

Commit

Permalink
Merge pull request #112 from dlab-berkeley/kz_module2
Browse files Browse the repository at this point in the history
Kz module2
  • Loading branch information
kaseyzapatka authored Mar 7, 2024
2 parents 161cb42 + e4ca46d commit 9155663
Show file tree
Hide file tree
Showing 4 changed files with 57 additions and 32 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,11 @@ How does this table relate to the different causal types? Let's consider this sa
# ----------
df_freq_obs_type <-
df %>% # pass data
mutate(type = case_when(type == 1 ~ "doomed",
type == 2 ~ "causal",
type == 3 ~ "preventive",
type == 4 ~ "immune",
)) %>%
group_by(A, Y_obs, type) %>% # group by the three variable we want
count() # count
Expand Down Expand Up @@ -688,10 +693,14 @@ ATE_bloc
```

**<span style="color:blue;">ANSWER 12:</span>** Yet again, the ATE obtained via block assignment (`r ATE_bloc`) is much closer to the true `ATE`(`r ATE`) than the observed estimate.
**<span style="color:blue;">ANSWER 12:</span>** Yet again, the ATE obtained via block assignment (`r ATE_bloc`) is much closer to the true `ATE`(`r ATE`) than the observed estimate (`r ATE_obs`).

Note again that this example is simply blocking individuals by location in our dataframe, and thus individuals in a given block are not statistically more similar to each other than they are to individuals in other blocks. In reality, blocks usually are statistically more similar to each other than to individuals in other blocks (for example, perhaps each block represents a geographical region).


So to summarize, our true ATE is **`r ATE`,** to which we arrive under a hypothetical situation where we know effects under both treatment and non-treatment. However, we can closely approximate that effect with complete randomization **`r ATE_comp`,** with cluster randomization is **`r ATE_clus`,** and with block randomization is **`r ATE_bloc`.** Importantly, all three of these randomization methods are closer than when we simply taken an average by treatment status in the absence of randomization: **`r ATE_obs`.**


# Statistical Tests of Difference

A common statistical question (in fact, often the actual research question of interest) is whether some variable in our dataset (usually the dependent or outcome variable) varies by one or more other (usually independent) variables.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,11 @@ How does this table relate to the different causal types? Let's consider this sa
# ----------
df_freq_obs_type <-
df %>% # pass data
mutate(type = case_when(type == 1 ~ "doomed",
type == 2 ~ "causal",
type == 3 ~ "preventive",
type == 4 ~ "immune",
)) %>%
group_by(A, Y_obs, type) %>% # group by the three variable we want
count() # count
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content />


<meta name="date" content="2024-03-06" />
<meta name="date" content="2024-03-07" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down Expand Up @@ -3026,7 +3026,7 @@ <h1>
<div id="header">
<h1 class="title">6-2 Randomized Experiments - Solutions</h1>
<p class="author"><em></em></p>
<p class="date"><em>March 06, 2024</em></p>
<p class="date"><em>March 07, 2024</em></p>
</div>
<p>In this lab, we are going to discuss Randomized Experiments. Causal inference methods can be used for observational data, but it is easier to first consider them in the context of randomized experiments. To begin we are going to created simulated data that we’d be unlikely to encounter in the real world where we give the same individual the treatment and then NOT give them a treatment. We’ll then calculate the “true” *<strong>A</strong>verage <strong>T</strong>reatment <strong>E</strong>ffect (<strong>ATE</strong>) and then show how different techniques of applying randomization will give us very close.</p>
<p>We will be leaning heavily on the <code>dplyr</code> library, so I’d encourage you to refer the <a href="https://rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf">dplyr cheat sheet</a> to refresh your memory and push your knowledge of how to use the library.</p>
Expand Down Expand Up @@ -3495,22 +3495,27 @@ <h1><span class="header-section-number">3</span> Experimental Designs<a href="#e
<span id="cb28-2"><a href="#cb28-2" tabindex="-1"></a><span class="co"># ----------</span></span>
<span id="cb28-3"><a href="#cb28-3" tabindex="-1"></a>df_freq_obs_type <span class="ot">&lt;-</span> </span>
<span id="cb28-4"><a href="#cb28-4" tabindex="-1"></a> df <span class="sc">%&gt;%</span> <span class="co"># pass data</span></span>
<span id="cb28-5"><a href="#cb28-5" tabindex="-1"></a> <span class="fu">group_by</span>(A, Y_obs, type) <span class="sc">%&gt;%</span> <span class="co"># group by the three variable we want</span></span>
<span id="cb28-6"><a href="#cb28-6" tabindex="-1"></a> <span class="fu">count</span>() <span class="co"># count</span></span>
<span id="cb28-7"><a href="#cb28-7" tabindex="-1"></a></span>
<span id="cb28-8"><a href="#cb28-8" tabindex="-1"></a>df_freq_obs_type</span></code></pre></div>
<span id="cb28-5"><a href="#cb28-5" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">type =</span> <span class="fu">case_when</span>(type <span class="sc">==</span> <span class="dv">1</span> <span class="sc">~</span> <span class="st">&quot;doomed&quot;</span>, </span>
<span id="cb28-6"><a href="#cb28-6" tabindex="-1"></a> type <span class="sc">==</span> <span class="dv">2</span> <span class="sc">~</span> <span class="st">&quot;causal&quot;</span>, </span>
<span id="cb28-7"><a href="#cb28-7" tabindex="-1"></a> type <span class="sc">==</span> <span class="dv">3</span> <span class="sc">~</span> <span class="st">&quot;preventive&quot;</span>, </span>
<span id="cb28-8"><a href="#cb28-8" tabindex="-1"></a> type <span class="sc">==</span> <span class="dv">4</span> <span class="sc">~</span> <span class="st">&quot;immune&quot;</span>, </span>
<span id="cb28-9"><a href="#cb28-9" tabindex="-1"></a> )) <span class="sc">%&gt;%</span> </span>
<span id="cb28-10"><a href="#cb28-10" tabindex="-1"></a> <span class="fu">group_by</span>(A, Y_obs, type) <span class="sc">%&gt;%</span> <span class="co"># group by the three variable we want</span></span>
<span id="cb28-11"><a href="#cb28-11" tabindex="-1"></a> <span class="fu">count</span>() <span class="co"># count</span></span>
<span id="cb28-12"><a href="#cb28-12" tabindex="-1"></a></span>
<span id="cb28-13"><a href="#cb28-13" tabindex="-1"></a>df_freq_obs_type</span></code></pre></div>
<pre><code>## # A tibble: 8 × 4
## # Groups: A, Y_obs, type [8]
## A Y_obs type n
## &lt;dbl&gt; &lt;dbl&gt; &lt;fct&gt; &lt;int&gt;
## 1 0 0 2 13861
## 2 0 0 4 15210
## 3 0 1 1 162327
## 4 0 1 3 147119
## 5 1 0 3 323909
## 6 1 0 4 36299
## 7 1 1 1 275031
## 8 1 1 2 26244</code></pre>
## A Y_obs type n
## &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;int&gt;
## 1 0 0 causal 13861
## 2 0 0 immune 15210
## 3 0 1 doomed 162327
## 4 0 1 preventive 147119
## 5 1 0 immune 36299
## 6 1 0 preventive 323909
## 7 1 1 causal 26244
## 8 1 1 doomed 275031</code></pre>
<p>Thus, we can see that each of the cells in the original table is actually composed of a mixture of two different causal types each. Conversely, each causal type makes up part of the cell count for two different arrangements of <code>A</code> and <code>Y_obs</code>. In other words:</p>
<ul>
<li>Those that DID NOT take AspiTyleCedrin and DID NOT experience a migraine could be either <em>“causal”</em> or <em>“immune”</em>.</li>
Expand Down Expand Up @@ -3721,8 +3726,9 @@ <h2><span class="header-section-number">3.4</span> Block Randomized Designs<a hr
<span id="cb55-9"><a href="#cb55-9" tabindex="-1"></a>ATE_bloc <span class="ot">&lt;-</span> est_bloc<span class="sc">$</span>mean[<span class="dv">2</span>] <span class="sc">-</span> est_bloc<span class="sc">$</span>mean[<span class="dv">1</span>]</span>
<span id="cb55-10"><a href="#cb55-10" tabindex="-1"></a>ATE_bloc</span></code></pre></div>
<pre><code>## [1] -0.431206</code></pre>
<p><strong><span style="color:blue;">ANSWER 12:</span></strong> Yet again, the ATE obtained via block assignment (-0.431206) is much closer to the true <code>ATE</code>(-0.430923) than the observed estimate.</p>
<p><strong><span style="color:blue;">ANSWER 12:</span></strong> Yet again, the ATE obtained via block assignment (-0.431206) is much closer to the true <code>ATE</code>(-0.430923) than the observed estimate (-0.4586686).</p>
<p>Note again that this example is simply blocking individuals by location in our dataframe, and thus individuals in a given block are not statistically more similar to each other than they are to individuals in other blocks. In reality, blocks usually are statistically more similar to each other than to individuals in other blocks (for example, perhaps each block represents a geographical region).</p>
<p>So to summarize, our true ATE is <strong>-0.430923,</strong> to which we arrive under a hypothetical situation where we know effects under both treatment and non-treatment. However, we can closely approximate that effect with complete randomization <strong>-0.4318155,</strong> with cluster randomization is <strong>-0.4312927,</strong> and with block randomization is <strong>-0.431206.</strong> Importantly, all three of these randomization methods are closer than when we simply taken an average by treatment status in the absence of randomization: <strong>-0.4586686.</strong></p>
</div>
</div>
<div id="statistical-tests-of-difference" class="section level1 hasAnchor" number="4">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3358,22 +3358,27 @@ <h1><span class="header-section-number">3</span> Experimental Designs<a href="#e
<span id="cb14-2"><a href="#cb14-2" tabindex="-1"></a><span class="co"># ----------</span></span>
<span id="cb14-3"><a href="#cb14-3" tabindex="-1"></a>df_freq_obs_type <span class="ot">&lt;-</span> </span>
<span id="cb14-4"><a href="#cb14-4" tabindex="-1"></a> df <span class="sc">%&gt;%</span> <span class="co"># pass data</span></span>
<span id="cb14-5"><a href="#cb14-5" tabindex="-1"></a> <span class="fu">group_by</span>(A, Y_obs, type) <span class="sc">%&gt;%</span> <span class="co"># group by the three variable we want</span></span>
<span id="cb14-6"><a href="#cb14-6" tabindex="-1"></a> <span class="fu">count</span>() <span class="co"># count</span></span>
<span id="cb14-7"><a href="#cb14-7" tabindex="-1"></a></span>
<span id="cb14-8"><a href="#cb14-8" tabindex="-1"></a>df_freq_obs_type</span></code></pre></div>
<span id="cb14-5"><a href="#cb14-5" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">type =</span> <span class="fu">case_when</span>(type <span class="sc">==</span> <span class="dv">1</span> <span class="sc">~</span> <span class="st">&quot;doomed&quot;</span>, </span>
<span id="cb14-6"><a href="#cb14-6" tabindex="-1"></a> type <span class="sc">==</span> <span class="dv">2</span> <span class="sc">~</span> <span class="st">&quot;causal&quot;</span>, </span>
<span id="cb14-7"><a href="#cb14-7" tabindex="-1"></a> type <span class="sc">==</span> <span class="dv">3</span> <span class="sc">~</span> <span class="st">&quot;preventive&quot;</span>, </span>
<span id="cb14-8"><a href="#cb14-8" tabindex="-1"></a> type <span class="sc">==</span> <span class="dv">4</span> <span class="sc">~</span> <span class="st">&quot;immune&quot;</span>, </span>
<span id="cb14-9"><a href="#cb14-9" tabindex="-1"></a> )) <span class="sc">%&gt;%</span> </span>
<span id="cb14-10"><a href="#cb14-10" tabindex="-1"></a> <span class="fu">group_by</span>(A, Y_obs, type) <span class="sc">%&gt;%</span> <span class="co"># group by the three variable we want</span></span>
<span id="cb14-11"><a href="#cb14-11" tabindex="-1"></a> <span class="fu">count</span>() <span class="co"># count</span></span>
<span id="cb14-12"><a href="#cb14-12" tabindex="-1"></a></span>
<span id="cb14-13"><a href="#cb14-13" tabindex="-1"></a>df_freq_obs_type</span></code></pre></div>
<pre><code>## # A tibble: 8 × 4
## # Groups: A, Y_obs, type [8]
## A Y_obs type n
## &lt;dbl&gt; &lt;dbl&gt; &lt;fct&gt; &lt;int&gt;
## 1 0 0 2 13861
## 2 0 0 4 15210
## 3 0 1 1 162327
## 4 0 1 3 147119
## 5 1 0 3 323909
## 6 1 0 4 36299
## 7 1 1 1 275031
## 8 1 1 2 26244</code></pre>
## A Y_obs type n
## &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;int&gt;
## 1 0 0 causal 13861
## 2 0 0 immune 15210
## 3 0 1 doomed 162327
## 4 0 1 preventive 147119
## 5 1 0 immune 36299
## 6 1 0 preventive 323909
## 7 1 1 causal 26244
## 8 1 1 doomed 275031</code></pre>
<p>Thus, we can see that each of the cells in the original table is actually composed of a mixture of two different causal types each. Conversely, each causal type makes up part of the cell count for two different arrangements of <code>A</code> and <code>Y_obs</code>. In other words:</p>
<ul>
<li>Those that DID NOT take AspiTyleCedrin and DID NOT experience a migraine could be either <em>“causal”</em> or <em>“immune”</em>.</li>
Expand Down

0 comments on commit 9155663

Please sign in to comment.