2-Educators-Analysis.Rmd

---
output: html_document
---


## Source Code for _In-person schooling and associated COVID-19 risk in the United States over Spring Semester 2021_ 
#### Kirsten E. Wiens, Claire P. Smith, Elena Badillo-Goicoechea, Kyra H. Grantz, M. Kate Grabowski, Andrew S. Azman, Elizabeth A. Stuart, Justin Lessler
 _________
 
Code is provided to reproduce the primary in-person schooling (`1-Main-Schooling-Analysis.Rmd`) and occupation/work outside home (`2-Educators-Analysis.Rmd`) analyses using a synthetic dataset. Hence, results produced when knitting these documents will **not** match those presented in the accompanying paper. 

Data to reproduce paper figures/tables are freely available from the CMU Delphi Research Group to researchers at universities and non-profits as detailed at Getting Data Access - Delphi Epidata API (cmu-delphi.github.io).

Please reach out to justin[at]jhu.edu if you have any questions.

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, message = FALSE, warning = FALSE, error=TRUE)
```

```{r loading-pkgs}
  ##Source code and library stuff
  library(tidyverse)
  library(anytime)
  library(survey)
  library(srvyr)
  library(maps)
  library(R.utils)
  library(data.table)
  library(extrafont)
  library(viridis)
  library(gtsummary)
  library(janitor)
  library(pheatmap)
  library(RColorBrewer)
  library(RcppRoll)
  library(gganimate)
  library(transformr)
  library(ggcorrplot)
  library(cdlTools)

  source("HelperFuncs.R")
  source("MainTextFigures.R")
```

```{r warning-message}

  knitr::knit_exit("**WARNING: This document will prepare tables and figures exploring the effects of in-person schooling using a synthetic dataset. Hence, results produced when knitting this document will not match those presented in the accompanying paper. Data to reproduce paper figures/tables are freely available from the CMU Delphi Research Group to researchers at universities and non-profits as detailed at Getting Data Access - Delphi Epidata API (cmu-delphi.github.io). To proceed with running this document with the synthetic dataset, please comment out or delete this line (ln 47 in the RMD).**")

```

```{r load data}
# load full dataset
df_ws <- readRDS("data/DATA_FOR_TESTING_ONLY_DOES_NOT_REPRODUCE_RESULTS_df_full_spring2021.rds")

# recode, without dropping HH with only 1 individual
df_ws <- standard_recode(df_ws, min_HH=1) %>%
  filter(!is.na(State))

# add county data
cnty_dat<- read_csv("data/county_factors.csv")%>%
  mutate(fips=str_pad(FIPS,width=5,pad="0"))%>%
  select(-FIPS, -X1)
  
df_ws <- left_join(df_ws, cnty_dat)
  
rm(cnty_dat)
```

```{r edtype-recode}
  ##Make and occupation K12Educator
  df_ws <- df_ws%>%
    mutate(OccEdRecode = OccupationRed,
           OccEdRecode = ifelse(OccEdRecode!="Education", 
                                as.character(OccEdRecode),
                                ifelse(educationjob_type%in%
                                         c("elemid_teacher",
                                           "preKteacher",
                                           "second_teacher"),
                                       "K12Teacher","OtherEductation")))
  
  
  ## Just to simplify the analysis, lets group small categories into 'Other'
  ## NOTE: in paper, threshold used to decide grouping was n<100,000
  other_cats <- c("Installation, maintenance, and repair", "Cleaning and maintenance",
                  "Arts/entertainment", "Comm/social service", "Production",
                  "Personal care and service", "Construction and extraction",
                  "Transportation and material moving/delivery", "Protective service")
  df_ws <- df_ws%>%
    group_by(OccEdRecode)%>%
    mutate(OccEdRecode = ifelse(OccEdRecode %in% other_cats, "Other", OccEdRecode))%>%
    ungroup()
  
  ##Make factor with Office and admin support as the
  ##intercept
  df_ws <- df_ws%>%
      mutate(OccEdRecode =factor(OccEdRecode),
            OccEdRecode = relevel(OccEdRecode, ref="Office and admin support"))

  # setting up the outcomes to explore and object for output
  outcomes <- c("tst_pos","cli_pos", "cli2_pos")
  outcome_nms <- c("Test+","Covid like illness","Loss taste/smell")
  occ_reg_tbl <- NULL
  
  ## Just keep columns we absolutely need to save on memory
  df_ws <- df_ws%>%
     select(c('fips', 'State', 'weight', 
              'Date', 'wk', 'period',  outcomes, 
              'OccEdRecode', 'work_outside_paid',
              'IsMale', 
              'Age', 
              'OccupationRed',
              'Educational_Level',
              'HH_sz', 'num_kid_recode', 
              'Population', 
              'White', 
              'GINI', 
              'Poverty', 
              'Description',
              'mask_level', 
              'trav_out_state',
              'bar_rest_cafe_activ',
              'large_event_activ',
              'pub_transit_activ',
              'any_vaccine'))

  ## add csse attack rates
  csse <- read_csv("data/JHUCSSE.csv")
  df_ws <- merge_csse(df_ws, csse)
    
  rm(csse)
```

```{r edu-text}
# aggregate occupation and work outside the home data to national data
occ_all <- aggregate_national(df_ws, variables = 'work_outside_paid', 
                             aggregate_by = c('OccEdRecode'))

# aggregate occupation and work outside the home data to national-monthly data
occ_mo <- aggregate_national(df_ws, variables = 'work_outside_paid', 
                             aggregate_by = c('month', 'OccEdRecode'))

# aggregate occupation and work outside the home data to national-weekly data
occ_wk <- aggregate_national(df_ws, variables = 'work_outside_paid', 
                             aggregate_by = c('wk', 'OccEdRecode'))
```

## Figure S7

Changes in paid work outside the home by occupation. Percent of all survey respondents in each occupation category that reported working outside of the home for pay in January and June, weighted to account for survey design. Respondents missing occupation designations are excluded. Occupations with fewer than 100,000 respondents are grouped in “Other”.
  
```{r figS7, fig.width=6, fig.height=3.5}

# make plot
occ_plot <- occ_mo %>%
  filter(!is.na(OccEdRecode),
         OccEdRecode != 'Not Employed') %>% 
  mutate(
    OccEdRecode=factor(OccEdRecode,
                            levels = c('Office and admin support', 'K12Teacher',
                                       'Food service', 'Healthcare', 'Sales',
                                       'OtherEductation', 'Other')),
         OccEdRecode=recode(OccEdRecode,
                            'Office and admin support'='Office/admin support',
                            'K12Teacher'='K-12 teacher',
                            'OtherEductation'='Other education')) %>%
  filter(month %in% c(1, 6)) %>%
  ggplot(aes(x = OccEdRecode, y = mean*100, fill = as.factor(month))) + 
        geom_bar(stat = 'identity', position = position_dodge(), width=0.8) +
        labs(y = 'Work outside  home (%)', x = NULL) +
        theme_bw()+
        scale_fill_manual(values = c('#A6611A', '#80CDC1'),
                          name = NULL, labels = c('January', 'June'))+
        theme(axis.text.x = element_text( angle = 45, hjust=1),
              legend.background = element_blank())

occ_plot
```
  
```{r outside-work-setup-models}

  # exclude food service because less than 5% of respondents report working exclusively at home
  df_ws <- df_ws%>%
    filter(OccEdRecode!='Food service')

  # set time periods
  time_periods <- list(c('jan_feb', 'mar_apr', 'may_jun'),
                       'jan_feb', 
                       'mar_apr', 
                       'may_jun')

  # running models for each outcome with occupation exposure
  # adjusted, with interactions b/w occupation and work outside home 
  # or vaccination status and work outside home
  for (t in time_periods) {
    message(t)

    for (i in 1:length(outcomes)) {
      message(i)
      f_lst <- NULL
      
      ##overall pop, adjusted, interaction with occupation
      f_lst <- bind_rows(f_lst,c(
        f=sprintf("%s~OccEdRecode+
                  OccEdRecode:work_outside_paid+
                  rel_avg_biwk_AR+
                    IsMale+Age+
                    Educational_Level+HH_sz+
                    Population+
                    White+
                    GINI+
                    Poverty+
                    Description+
                    mask_level+trav_out_state+
                    bar_rest_cafe_activ +
                    large_event_activ+
                    pub_transit_activ+
                    any_vaccine",outcomes[i]),
        adjusted=TRUE,
        interaction=TRUE,
        subset=FALSE))
      
      ##subset to just teachers, adjusted, interaction with vaccination
      f_lst <- bind_rows(f_lst,c(
        f=sprintf("%s~any_vaccine+
                    any_vaccine:work_outside_paid+
                    rel_avg_biwk_AR+
                    IsMale+Age+
                    Educational_Level+HH_sz+
                    Population+
                    White+
                    GINI+
                    Poverty+
                    Description+
                    mask_level+trav_out_state+
                    bar_rest_cafe_activ +
                    large_event_activ+
                    pub_transit_activ",outcomes[i]),
        adjusted=TRUE,
        interaction=TRUE,
        subset=TRUE))
      
      # loop over models to run
      for(j in 1:nrow(f_lst)) {

        reg <- survey_reg(df_ws%>%
                            filter(if (f_lst$subset[j]==TRUE) {
                              # subset to just respondents who are K-12 teachers
                              # as well as period for analysis
                              OccEdRecode == 'K12Teacher' & period %in% t
                              } else {
                                # otherwise, include all data within the period for analysis
                                period %in% t
                              }),
                          formula(f_lst$f[j]))
        # run regressions set up above
        occ_reg_tbl <-  gtsummary::tbl_regression(reg)$table_body%>%
          # save regression results with informative labels
          mutate(adjusted=f_lst$adjusted[j],
                 interaction=f_lst$interaction[j],
                 subset=f_lst$subset[j],
                 outcome=outcome_nms[i],
                 period=ifelse(length(t)>1, 'Jan-Jun',
                           ifelse(t=='jan_feb', 'Jan-Feb',
                                  ifelse(t=='mar_apr', 'Mar-Apr', 'May-Jun'))))%>%
          bind_rows(occ_reg_tbl)
      }
    }
  }

```

## Figure 9

Risk by occupation and paid work outside the home. Odds ratio of COVID-19-related outcomes, contrasting office workers not reporting extra-household work for pay to those in other employment categories not reporting work for pay outside the home (top row), and to those reporting work for pay outside the home (bottom row). The middle row shows the odds ratio (i.e., increased risk) within each category associated with working outside the home compared to no work outside the home. Food service workers are excluded from this analysis because less than 5% reported working exclusively from home (Fig. S7). Results are shown across the study period from Jan. 12 to Jun. 12.

```{r figS8-fig9-outside-work}

# periods to plot
plot_periods <- list(
  c('Jan-Feb', 'Mar-Apr', 'May-Jun'),
  c('Jan-Jun')
)

for (p in plot_periods) {

  occ_reg_tbl_9 <- occ_reg_tbl%>%
        filter(subset == FALSE) %>%
        mutate(main_term = ifelse(var_type=="interaction",
                                str_extract(label,"(.+)(?= \\*)" ),
                                label))


  # reformatting object for plotting
  tmp <- occ_reg_tbl_9 %>%
    mutate(adjusted=adjusted=="TRUE",interaction=interaction=="TRUE")%>%
    filter(adjusted,interaction,
           str_detect(variable,"OccEdRecode"),
           label!="OccEdRecode",
           label!="Office and admin support",
           row_type!="label")%>%
    select(period, label, main_term, estimate, 
           std.error,
           outcome, adjusted, interaction)%>%
    group_by(period, main_term, outcome)%>%
    summarize(estimate=sum(estimate),
              std.error=sqrt(sum(std.error^2)))%>%
    ungroup()%>%mutate(inter_term="outside work*baseline")
    
  tmp <-  occ_reg_tbl_9 %>%
    mutate(adjusted=adjusted=="TRUE",interaction=interaction=="TRUE")%>%
    filter(adjusted,interaction,
           str_detect(variable,"OccEdRecode"),
           label!="OccEdRecode",
           label!="Office and admin support",
           row_type!="label")%>%
    filter(var_type=="interaction")%>%
    select(period, main_term, estimate, 
           std.error,outcome)%>%
    mutate(inter_term="outside work")%>%
    bind_rows(tmp)
  
  
  tmp <-  occ_reg_tbl_9 %>%
    mutate(adjusted=adjusted=="TRUE",interaction=interaction=="TRUE")%>%
    filter(adjusted,interaction,
           str_detect(variable,"OccEdRecode"),
           label!="OccEdRecode",
           label!="Office and admin support",
           row_type!="label")%>%
    filter(var_type=="categorical")%>%
    select(period, main_term, estimate, 
           std.error,outcome)%>%
    mutate(inter_term="baseline")%>%
    bind_rows(tmp)
  
  overall_interact_plt <-tmp %>%
    filter(period %in% p) %>%
    mutate(main_term=factor(main_term,
                            levels = c('Office and admin support', 'K12Teacher', 
                                       'Food service', 'Healthcare', 'Sales',
                                       'OtherEductation', 'Other')),
           main_term=recode(main_term,
                            'Office and admin support'='Office/admin support',
                            'K12Teacher'='K-12 teacher',
                            'OtherEductation'='Other education'),
           outcome=recode(outcome,
                          'Test+'='Overall Test+'))%>%
    mutate(conf.low=exp(estimate-1.96*std.error),
           conf.high=exp(estimate+1.96*std.error),
           estimate=exp(estimate))%>%
    ggplot(aes(x=main_term, y=estimate, ymin=conf.low, ymax=conf.high,
               color=outcome))+
    geom_pointrange(position=position_dodge(width=.5))+
    scale_y_log10()+
    scale_color_brewer(type="qual", palette = "Dark2", name=NULL)+
    theme_bw()+
    theme(legend.position="bottom",
          axis.text.x = element_text( angle = 45, hjust=1))+
    geom_hline(yintercept=1) +
    labs(x=NULL, y='Odds ratio')
  
  if (length(p)>1) {
    overall_interact_plt1 <- overall_interact_plt +
      facet_grid(rows=vars(inter_term),
                 cols=vars(period),
                 scales="free_y")
  
  } else {
    overall_interact_plt2 <- overall_interact_plt +
      facet_grid(rows=vars(inter_term),
                 scales="free_y")
  }
  
}
```

```{r fig9, fig.width=4.5, fig.height=6}
overall_interact_plt2
```

## Figure S8

Risk by occupation and paid work outside the home. Odds ratio of COVID-19-related outcomes, contrasting office workers not reporting extra-household work for pay to those in other employment categories not reporting work for pay outside the home (top row), and to those reporting work for pay outside the home (bottom row). The middle row shows the odds ratio (i.e., increased risk) within each category associated with working outside the home compared to no work outside the home. Food service workers are excluded from this analysis because less than 5% reported working exclusively from home (Fig. S7). Results are shown stratified by time periods Jan. 12 to Feb. 28 (Jan-Feb), Mar. 1 to Apr. 30 (Mar-Apr), and May 1 to Jun. 12 (May-Jun)).

```{r figS8, fig.width=8, fig.height=8}
overall_interact_plt1
```

## Figure S9

COVID-19 risk among K-12 Teachers by vaccination status and paid work outside the home. Odds ratio of COVID-19-related outcomes, contrasting unvaccinated K-12 Teachers not reporting extra-household work for pay to vaccinated (any doses) K-12 teachers not reporting work for pay outside the home (top), and to those reporting work for pay outside the home (bottom). The middle row shows the odds ratio (i.e., increased risk) within each vaccination category associated with working outside the home compared to no work outside the home.

```{r figS9, fig.width=4, fig.height=5.5}

# periods to plot
plot_periods <- list(
  c('Jan-Jun')
)

for (p in plot_periods) {

  occ_reg_tbl_S9 <- occ_reg_tbl%>%
        filter(subset == TRUE) %>%
        mutate(main_term = ifelse(var_type=="interaction",
                                str_extract(label,"(.+)(?= \\*)" ),
                                label))


  # reformatting object for plotting
  tmp <- occ_reg_tbl_S9 %>%
    mutate(adjusted=adjusted=="TRUE",interaction=interaction=="TRUE")%>%
    filter(adjusted,interaction,
           str_detect(variable,"any_vaccine"),
           label!="any_vaccine",
           row_type!="label")%>%
    select(period, label, main_term, estimate, 
           std.error,
           outcome, adjusted, interaction)%>%
    group_by(period, main_term, outcome)%>%
    summarize(estimate=sum(estimate),
              std.error=sqrt(sum(std.error^2)))%>%
    ungroup()%>%mutate(inter_term="outside work*baseline")
    
  tmp <-  occ_reg_tbl_S9 %>%
    mutate(adjusted=adjusted=="TRUE",interaction=interaction=="TRUE")%>%
    filter(adjusted,interaction,
           str_detect(variable,"any_vaccine"),
           label!="any_vaccine",
           row_type!="label")%>%
    filter(var_type=="interaction")%>%
    select(period, main_term, estimate, 
           std.error,outcome)%>%
    mutate(inter_term="outside work")%>%
    bind_rows(tmp)
  
  
  tmp <-  occ_reg_tbl_S9 %>%
    mutate(adjusted=adjusted=="TRUE",interaction=interaction=="TRUE")%>%
    filter(adjusted,interaction,
           str_detect(variable,"any_vaccine"),
           label!="any_vaccine",
           row_type!="label")%>%
    filter(var_type=="categorical")%>%
    select(period, main_term, estimate, 
           std.error,outcome)%>%
    mutate(inter_term="baseline")%>%
    bind_rows(tmp)
  
  overall_interact_plt <-tmp %>%
    filter(period %in% p) %>%
    mutate(main_term=ifelse(main_term=='NA', 'any_vaccineFALSE', main_term),
           main_term=factor(main_term,
                            levels = c('any_vaccineFALSE', 
                                       'any_vaccineTRUE')),
           main_term=recode(main_term,
                            'any_vaccineFALSE'='Unvaccinated',
                            'any_vaccineTRUE'='Vaccinated'),
           outcome=recode(outcome,
                          'Test+'='Overall Test+'))%>%
    mutate(conf.low=exp(estimate-1.96*std.error),
           conf.high=exp(estimate+1.96*std.error),
           estimate=exp(estimate))%>%
    ggplot(aes(x=main_term, y=estimate, ymin=conf.low, ymax=conf.high,
               color=outcome))+
    geom_pointrange(position=position_dodge(width=.5))+
    scale_y_log10()+
    scale_color_brewer(type="qual", palette = "Dark2", name=NULL)+
    theme_bw()+
    theme(legend.position="bottom")+
    geom_hline(yintercept=1) +
    labs(x=NULL, y='Odds ratio')
  
  overall_interact_plt <- overall_interact_plt +
    facet_grid(rows=vars(inter_term),
               cols=vars(period),
               scales="free_y")

}

overall_interact_plt
```

## Table S8

Relative odds of COVID-19-related outcomes among respondents who did, and did not, report any paid work outside the home during different time periods compared to office and administrative staff not working outside of the home for pay.

```{r tableS8}
# prep to make table
occ_reg_tbl_9 <- occ_reg_tbl_9 %>%
  filter(subset == FALSE,
         str_detect(variable,"OccEdRecode"))

# make pretty table
occ_reg_disp_tbl <- reg_tbl_period_wrangle(occ_reg_tbl_9, 'Occupation')

# print result
occ_reg_disp_tbl

# save
gt::gtsave(occ_reg_disp_tbl, paste0('tables/tableS8.html'))
```

## Table S9

Relative odds of COVID-19-related outcomes among K-12 teachers who were vaccinated with any number of COVID-19 vaccine doses and who did, or did not, report any paid work outside the home during different time periods compared to K-12 teachers who were unvaccinated and not working outside of the home for pay.

```{r tableS9}
# prep to make table
occ_reg_tbl_S9 <- occ_reg_tbl_S9 %>%
    filter(subset == TRUE,
           str_detect(variable,"any_vaccine")) %>%
    mutate(label = ifelse(term == 'any_vaccineFALSE:work_outside_paidTRUE', 
                          'unvaccinated * NA', label),
           label = gsub('any_vaccineTRUE', 'vaccinated', label))

# make pretty table
occ_reg_disp_tbl <- reg_tbl_period_wrangle(occ_reg_tbl_S9, 'Vaccine status')

# print result
occ_reg_disp_tbl

# save
gt::gtsave(occ_reg_disp_tbl, paste0('tables/tableS9.html'))
```