Skip to content

Commit

Permalink
refactor: step_epi_slide
Browse files Browse the repository at this point in the history
* validate_slide_fun now rejects formula f
* remove warning about optimized slide functions until that PR
* fix tests
* remove try_period and replace with epiprocess internal
* remove slider dependency
* update documentation
  • Loading branch information
dshemetov committed Aug 3, 2024
1 parent 0d7c001 commit 740d438
Show file tree
Hide file tree
Showing 14 changed files with 86 additions and 206 deletions.
5 changes: 2 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: epipredict
Title: Basic epidemiology forecasting methods
Version: 0.0.17
Version: 0.0.18
Authors@R: c(
person("Daniel", "McDonald", , "[email protected]", role = c("aut", "cre")),
person("Ryan", "Tibshirani", , "[email protected]", role = "aut"),
Expand Down Expand Up @@ -35,12 +35,10 @@ Imports:
ggplot2,
glue,
hardhat (>= 1.3.0),
lubridate,
magrittr,
quantreg,
recipes (>= 1.0.4),
rlang (>= 1.0.0),
slider,
smoothqr,
stats,
tibble,
Expand All @@ -55,6 +53,7 @@ Suggests:
epidatr (>= 1.0.0),
fs,
knitr,
lubridate,
poissonreg,
purrr,
ranger,
Expand Down
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,4 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.0.x will indicat
`...` args intended for `predict.model_fit()`
- `bake.epi_recipe()` will now re-infer the geo and time type in case baking the
steps has changed the appropriate values
- Add a step to produce generic sliding computations over an `epi_df`
- Add `step_epi_slide` to produce generic sliding computations over an `epi_df`
127 changes: 48 additions & 79 deletions R/step_epi_slide.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
#' that will generate one or more new columns of derived data by "sliding"
#' a computation along existing data.
#'
#'
#' @inheritParams step_epi_lag
#' @param .f A function in one of the following formats:
#' 1. An unquoted function name with no arguments, e.g., `mean`
Expand All @@ -20,27 +19,15 @@
#' argument must be named `.x`. A common, though very difficult to debug
#' error is using something like `function(x) mean`. This will not work
#' because it returns the function mean, rather than `mean(x)`
#' @param before,after the size of the sliding window on the left and the right
#' of the center. Usually non-negative integers for data indexed by date, but
#' more restrictive in other cases (see [epiprocess::epi_slide()] for details).
#' @param prefix A character string that will be prefixed to the new column.
#' @param f_name a character string of at most 20 characters that describes
#' the function. This will be combined with `prefix` and the columns in `...`
#' to name the result using `{prefix}{f_name}_{column}`. By default it will be determined
#' automatically using `clean_f_name()`.
#' @param before,after non-negative integers.
#' How far `before` and `after` each `time_value` should
#' the sliding window extend? Any value provided for either
#' argument must be a single, non-`NA`, non-negative,
#' [integer-compatible][vctrs::vec_cast] number of time steps. Endpoints of
#' the window are inclusive. Common settings:
#' * For trailing/right-aligned windows from `time_value - time_step(k)` to
#' `time_value`, use `before=k, after=0`. This is the most likely use case
#' for the purposes of forecasting.
#' * For center-aligned windows from `time_value - time_step(k)` to
#' `time_value + time_step(k)`, use `before=k, after=k`.
#' * For leading/left-aligned windows from `time_value` to
#' `time_value + time_step(k)`, use `after=k, after=0`.
#'
#' You may also pass a [lubridate::period], like `lubridate::weeks(1)` or a
#' character string that is coercible to a [lubridate::period], like
#' `"2 weeks"`.
#' @template step-return
#'
#' @export
Expand Down Expand Up @@ -69,9 +56,8 @@ step_epi_slide <-
rlang::abort("This recipe step can only operate on an `epi_recipe`.")
}
.f <- validate_slide_fun(.f)
arg_is_scalar(before, after)
before <- try_period(before)
after <- try_period(after)
epiprocess:::validate_slide_window_arg(before, attributes(recipe$template)$metadata$time_type)
epiprocess:::validate_slide_window_arg(after, attributes(recipe$template)$metadata$time_type)
arg_is_chr_scalar(role, prefix, id)
arg_is_lgl_scalar(skip)

Expand Down Expand Up @@ -126,7 +112,6 @@ step_epi_slide_new <-
}



#' @export
prep.step_epi_slide <- function(x, training, info = NULL, ...) {
col_names <- recipes::recipes_eval_select(x$terms, data = training, info = info)
Expand All @@ -150,7 +135,6 @@ prep.step_epi_slide <- function(x, training, info = NULL, ...) {
}



#' @export
bake.step_epi_slide <- function(object, new_data, ...) {
recipes::check_new_data(names(object$columns), object, new_data)
Expand All @@ -170,12 +154,16 @@ bake.step_epi_slide <- function(object, new_data, ...) {
class = "epipredict__step__name_collision_error"
)
}
if (any(vapply(c(mean, sum), \(x) identical(x, object$.f), logical(1L)))) {
cli_warn(
c("There is an optimized version of both mean and sum. See `step_epi_slide_mean`, `step_epi_slide_sum`, or `step_epi_slide_opt`."),
class = "epipredict__step_epi_slide__optimized_version"
)
}
# TODO: Uncomment this whenever we make the optimized versions available.
# if (any(vapply(c(mean, sum), \(x) identical(x, object$.f), logical(1L)))) {
# cli_warn(
# c(
# "There is an optimized version of both mean and sum. See `step_epi_slide_mean`, `step_epi_slide_sum`,
# or `step_epi_slide_opt`."
# ),
# class = "epipredict__step_epi_slide__optimized_version"
# )
# }
epi_slide_wrapper(
new_data,
object$before,
Expand All @@ -187,48 +175,51 @@ bake.step_epi_slide <- function(object, new_data, ...) {
object$prefix
)
}
#' wrapper to handle epi_slide particulars


#' Wrapper to handle epi_slide particulars
#'
#' @description
#' This should simplify somewhat in the future when we can run `epi_slide` on
#' columns. Surprisingly, lapply is several orders of magnitude faster than
#' using roughly equivalent tidy select style.
#'
#' @param fns vector of functions, even if it's length 1.
#' @param group_keys the keys to group by. likely `epi_keys[-1]` (to remove time_value)
#'
#' @importFrom tidyr crossing
#' @importFrom dplyr bind_cols group_by ungroup
#' @importFrom epiprocess epi_slide
#' @keywords internal
epi_slide_wrapper <- function(new_data, before, after, columns, fns, fn_names, group_keys, name_prefix) {
cols_fns <- tidyr::crossing(col_name = columns, fn_name = fn_names, fn = fns)
# Iterate over the rows of cols_fns. For each row number, we will output a
# transformed column. The first result returns all the original columns along
# with the new column. The rest just return the new column.
seq_len(nrow(cols_fns)) %>%
lapply( # iterate over the rows of cols_fns
# takes in the row number, outputs the transformed column
function(comp_i) {
# extract values from the row
col_name <- cols_fns[[comp_i, "col_name"]]
fn_name <- cols_fns[[comp_i, "fn_name"]]
fn <- cols_fns[[comp_i, "fn"]][[1L]]
result_name <- paste(name_prefix, fn_name, col_name, sep = "_")
result <- new_data %>%
group_by(across(all_of(group_keys))) %>%
epi_slide(
before = before,
after = after,
new_col_name = result_name,
f = function(slice, geo_key, ref_time_value) {
fn(slice[[col_name]])
}
) %>%
ungroup()
# the first result needs to include all of the original columns
if (comp_i == 1L) {
result
} else {
# everything else just needs that column transformed
result[result_name]
}
lapply(function(comp_i) {
col_name <- cols_fns[[comp_i, "col_name"]]
fn_name <- cols_fns[[comp_i, "fn_name"]]
fn <- cols_fns[[comp_i, "fn"]][[1L]]
result_name <- paste(name_prefix, fn_name, col_name, sep = "_")
result <- new_data %>%
group_by(across(all_of(group_keys))) %>%
epi_slide(
before = before,
after = after,
new_col_name = result_name,
f = function(slice, geo_key, ref_time_value) {
fn(slice[[col_name]])
}
) %>%
ungroup()

if (comp_i == 1L) {
result
} else {
result[result_name]
}
) %>%
}) %>%
bind_cols()
}

Expand Down Expand Up @@ -286,33 +277,11 @@ validate_slide_fun <- function(.f) {
cli_abort("In, `step_epi_slide()`, `.f` may not be missing.")
}
if (rlang::is_formula(.f, scoped = TRUE)) {
if (!is.null(rlang::f_lhs(.f))) {
cli_abort("In, `step_epi_slide()`, `.f` must be a one-sided formula.")
}
cli_abort("In, `step_epi_slide()`, `.f` cannot be a formula.")
} else if (rlang::is_character(.f)) {
.f <- rlang::as_function(.f)
} else if (!rlang::is_function(.f)) {
cli_abort("In, `step_epi_slide()`, `.f` must be a function.")
}
.f
}

try_period <- function(x) {
err <- is.na(x)
if (!err) {
if (is.numeric(x)) {
err <- !rlang::is_integerish(x) || x < 0
} else {
x <- lubridate::as.period(x)
err <- is.na(x)
}
}
if (err) {
cli_abort(paste(
"The value supplied to `before` or `after` must be a non-negative integer",
"a {.cls lubridate::period} or a character scalar that can be coerced",
'as a {.cls lubridate::period}, e.g., `"1 week"`.'
), )
}
x
}
2 changes: 1 addition & 1 deletion man/add_epi_recipe.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/arx_classifier.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/arx_forecaster.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/cdc_baseline_forecaster.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/epi_slide_wrapper.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/flatline_forecaster.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/get_test_data.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/grad_employ_subset.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/predict-epi_workflow.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 4 additions & 20 deletions man/step_epi_slide.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 740d438

Please sign in to comment.