Skip to content

Commit

Permalink
docs(epi_df.Rmd): immediate ungrouping + discuss completion effects
Browse files Browse the repository at this point in the history
It's probably best to immediately ungroup after performing grouped operations in
our documentation, as leaving things grouped accidentally is a source of errors.
Sometime we should consider an overhaul to use `by =` and `.by =` where
appropriate (sorting effects not needed) and available (not all operations
support this syntax yet).

There were already 0s in the example data, so "highlight" with words the effects
of completion + note one potential surprise in other applications.
  • Loading branch information
brookslogan committed Dec 19, 2024
1 parent 97495b9 commit d32b605
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion vignettes/epi_df.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,8 @@ First, let's create a data set with some missing data. We will reuse the dataset
edf_missing <- edf %>%
filter(geo_value %in% c("ca", "tx")) %>%
group_by(geo_value) %>%
slice(1:3, 5:6)
slice(1:3, 5:6) %>%
ungroup()
edf_missing %>%
print(n = 10)
Expand All @@ -346,12 +347,17 @@ Now let's fill in the missing data with explicit zeros:

```{r}
edf_missing %>%
group_by(geo_value) %>%
complete(
time_value = seq.Date(min(time_value), max(time_value), by = "day"),
fill = list(cases = 0)
) %>%
ungroup() %>%
print(n = 12)
```
We see that rows have been added for the missing `time_value` 2020-03-04 for
both of the states, with `cases` set to `0`. If there were explicit `NA`s in the
`cases` column, those would have been replaced by `0` as well.

### Detecting and filling time gaps with `tsibble`

Expand Down

0 comments on commit d32b605

Please sign in to comment.