Skip to content

Commit

Permalink
docs(aggregation.Rmd): tweak readr & join usage
Browse files Browse the repository at this point in the history
* Rename state_census -> state_naming.
* Provide col_types specs for all & only columns used; avoid message spam.
* Don't select unused cols; especially avoid the numeric state FIPS.
* Bump dependency on dplyr and update joins; avoid message spam.
  • Loading branch information
brookslogan committed Oct 4, 2024
1 parent 635ff4d commit 7463475
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 11 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Imports:
checkmate,
cli,
data.table,
dplyr (>= 1.0.8),
dplyr (>= 1.1.0),
genlasso,
ggplot2,
glue,
Expand Down
11 changes: 9 additions & 2 deletions data-raw/jhu_csse_county_level_subset.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
library(readr)
library(epidatr)
library(epiprocess)
library(dplyr)

y <- readr::read_csv("https://github.com/cmu-delphi/covidcast/raw/c89e4d295550ba1540d64d2cc991badf63ad04e5/Python-packages/covidcast-py/covidcast/geo_mappings/county_census.csv") %>% # nolint: line_length_linter
y <- read_csv("https://github.com/cmu-delphi/covidcast/raw/c89e4d295550ba1540d64d2cc991badf63ad04e5/Python-packages/covidcast-py/covidcast/geo_mappings/county_census.csv", # nolint: line_length_linter
col_types = cols(
FIPS = col_character(),
STNAME = col_character(),
CTYNAME = col_character()
)
) %>%
filter(STNAME %in% c("Massachusetts", "Vermont"), STNAME != CTYNAME) %>%
select(geo_value = FIPS, county_name = CTYNAME, state_name = STNAME)

Expand All @@ -16,7 +23,7 @@ jhu_csse_county_level_subset <- pub_covidcast(
time_values = epirange(20200601, 20211231),
) %>%
select(geo_value, time_value, cases = value) %>%
full_join(y, by = "geo_value") %>%
inner_join(y, by = "geo_value", relationship = "many-to-one", unmatched = "error") %>%
as_epi_df()

usethis::use_data(jhu_csse_county_level_subset, overwrite = TRUE)
22 changes: 14 additions & 8 deletions vignettes/aggregation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,19 @@ kinds of tasks with `epi_df` objects. We'll work with county-level reported
COVID-19 cases in MA and VT.

```{r, message = FALSE, eval= FALSE, warning= FALSE}
library(readr)
library(epidatr)
library(epiprocess)
library(dplyr)
# Get mapping between FIPS codes and county&state names:
y <- readr::read_csv("https://github.com/cmu-delphi/covidcast/raw/c89e4d295550ba1540d64d2cc991badf63ad04e5/Python-packages/covidcast-py/covidcast/geo_mappings/county_census.csv") %>% # nolint: line_length_linter
y <- read_csv("https://github.com/cmu-delphi/covidcast/raw/c89e4d295550ba1540d64d2cc991badf63ad04e5/Python-packages/covidcast-py/covidcast/geo_mappings/county_census.csv", # nolint: line_length_linter
col_types = c(
FIPS = col_character(),
CTYNAME = col_character(),
STNAME = col_character()
)
) %>%
filter(STNAME %in% c("Massachusetts", "Vermont"), STNAME != CTYNAME) %>%
select(geo_value = FIPS, county_name = CTYNAME, state_name = STNAME)
Expand All @@ -39,6 +46,7 @@ x <- pub_covidcast(
The data contains 16,212 rows and 5 columns.

```{r, echo=FALSE, warning=FALSE, message=FALSE}
library(readr)
library(epidatr)
library(epiprocess)
library(dplyr)
Expand Down Expand Up @@ -108,17 +116,15 @@ help avoid bugs in further downstream data processing tasks.
Let's first remove certain dates from our data set to create gaps:

```{r}
state_census <- readr::read_csv("https://github.com/cmu-delphi/covidcast/raw/c89e4d295550ba1540d64d2cc991badf63ad04e5/Python-packages/covidcast-py/covidcast/geo_mappings/state_census.csv") %>% # nolint: line_length_linter
select(STATE, NAME, POPESTIMATE2019, ABBR) %>%
rename(abbr = ABBR, name = NAME, pop = POPESTIMATE2019, fips = STATE) %>%
mutate(abbr = tolower(abbr)) %>%
state_naming <- read_csv("https://github.com/cmu-delphi/covidcast/raw/c89e4d295550ba1540d64d2cc991badf63ad04e5/Python-packages/covidcast-py/covidcast/geo_mappings/state_census.csv", # nolint: line_length_linter
col_types = c(NAME = col_character(), ABBR = col_character())
) %>%
transmute(state_name = NAME, abbr = tolower(ABBR)) %>%
as_tibble()
# First make geo value more readable for tables, plots, etc.
x <- x %>%
inner_join(
state_census %>% select(state_name = name, abbr)
) %>%
inner_join(state_naming, by = "state_name", relationship = "many-to-one", unmatched = "error") %>%
mutate(geo_value = paste(substr(county_name, 1, nchar(county_name) - 7), state_name, sep = ", ")) %>%
select(geo_value, time_value, cases)
Expand Down

0 comments on commit 7463475

Please sign in to comment.