docs(epi_df.Rmd): immediate ungrouping + discuss completion effects

It's probably best to immediately ungroup after performing grouped operations in our documentation, as leaving things grouped accidentally is a source of errors. Sometime we should consider an overhaul to use `by =` and `.by =` where appropriate (sorting effects not needed) and available (not all operations support this syntax yet). There were already 0s in the example data, so "highlight" with words the effects of completion + note one potential surprise in other applications.
cmu-delphi · Dec 19, 2024 · d32b605 · d32b605
1 parent 97495b9
commit d32b605
Showing 1 changed file with 7 additions and 1 deletion.
diff --git a/vignettes/epi_df.Rmd b/vignettes/epi_df.Rmd
@@ -336,7 +336,8 @@ First, let's create a data set with some missing data. We will reuse the dataset
 edf_missing <- edf %>%
   filter(geo_value %in% c("ca", "tx")) %>%
   group_by(geo_value) %>%
-  slice(1:3, 5:6)
+  slice(1:3, 5:6) %>%
+  ungroup()
 
 edf_missing %>%
   print(n = 10)
@@ -346,12 +347,17 @@ Now let's fill in the missing data with explicit zeros:
 
 ```{r}
 edf_missing %>%
+  group_by(geo_value) %>%
   complete(
     time_value = seq.Date(min(time_value), max(time_value), by = "day"),
     fill = list(cases = 0)
   ) %>%
+  ungroup() %>%
   print(n = 12)
 ```
+We see that rows have been added for the missing `time_value` 2020-03-04 for
+both of the states, with `cases` set to `0`. If there were explicit `NA`s in the
+`cases` column, those would have been replaced by `0` as well.
 
 ### Detecting and filling time gaps with `tsibble`