Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: main -> dev #376

Closed
wants to merge 37 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
468bfc0
drop arg specifying number of args to expect
nmdefries Jun 21, 2023
460fe63
update tests
nmdefries Jun 21, 2023
18ae844
move f arg check to as_slide_computation
nmdefries Jun 22, 2023
5ffb590
add more testing for .x and .data access
nmdefries Jun 22, 2023
3ae827d
make epi_slide data mask creation match epix_slide
nmdefries Jun 22, 2023
e188e2b
move quosure -> function creation to as_slide_computation
nmdefries Jun 26, 2023
54b744c
deduplicate function/formula and quosure slides
nmdefries Jun 27, 2023
0666c8a
build docs
nmdefries Jun 27, 2023
30ff151
comments
nmdefries Jun 27, 2023
6a594f1
comments
nmdefries Jun 28, 2023
a18be49
as_slide_computation documentation
nmdefries Jun 28, 2023
894ba23
drop env arg; as_slide_computation no longer handles functions-as-str…
nmdefries Jun 28, 2023
7a1e0ba
periods
nmdefries Jun 28, 2023
93e965f
document non-empty dots and update function imports
nmdefries Jun 28, 2023
a317594
make wrapper language clearer; add dots to ref_time_value
nmdefries Jun 28, 2023
d0acab4
rename as_slide_computation func input to avoid name collisions
nmdefries Jun 29, 2023
00e967d
move ref_time_value calculation to a wrapper in `epi_slide`
nmdefries Jun 29, 2023
9d3c01c
add rlang authors as contributors
nmdefries Jun 28, 2023
c400cd5
add rlang license to source and describe our changes
nmdefries Jun 28, 2023
dbd5b18
add copyright file and "person" in description
nmdefries Jun 28, 2023
19185f4
ci: stop building pkgdown on forks
dsweber2 Jul 13, 2023
54eacf3
fix documentation and DESCRIPTION typos
nmdefries Jul 24, 2023
398de24
unclear who main copyright holder is on our side, so omit for now
nmdefries Jul 24, 2023
e5ed4d0
redocument
nmdefries Jul 24, 2023
f441c44
ci: notate why we're checking the repo owner
dsweber2 Jul 26, 2023
c936387
remove duplicate tests and suppress warnings when testing for errors
nmdefries Jul 27, 2023
a9128e9
Merge pull request #336 from cmu-delphi/ndefries/drop-nargs-num-args-…
nmdefries Jul 27, 2023
655aa69
make sure all named args in as_slide_computation are used in calling …
nmdefries Jul 27, 2023
62d76aa
Improve `as_slide_computation()` unsupported class error message
brookslogan Aug 2, 2023
8c8bc52
docs(as_slide_computation): sync formula->comp special class rename
brookslogan Aug 2, 2023
2791f01
refactor(as_slide_computation): quo conversion needs no `...`
brookslogan Aug 3, 2023
81fb4bc
feat(as_slide_computation): abort nonempty ... with formula
brookslogan Aug 8, 2023
e43371f
`expect_warning` in some noisy warning+error tests
brookslogan Aug 8, 2023
7be2e66
Merge remote-tracking branch 'upstream/main' into ndefries/func-conve…
brookslogan Aug 8, 2023
1eceeb6
Merge pull request #337 from cmu-delphi/ndefries/func-conversion-expa…
nmdefries Aug 11, 2023
5c5b70e
Merge pull request #338 from cmu-delphi/ndefries/rlang-attribution
nmdefries Aug 11, 2023
9165bcc
Merge pull request #346 from dsweber2/patch-1
dshemetov Nov 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ name: pkgdown

jobs:
pkgdown:
# only build docs on the main repository and not forks
if: github.repository_owner == 'cmu-delphi'
runs-on: ubuntu-latest
# Only restrict concurrency for non-PR jobs
concurrency:
Expand Down
6 changes: 5 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,17 @@ Authors@R: c(
person("Quang", "Nguyen", role = "ctb"),
person("Evan", "Ray", role = "aut"),
person("Dmitry", "Shemetov", role = "ctb"),
person("Ryan", "Tibshirani", , "[email protected]", role = c("aut", "cre"))
person("Ryan", "Tibshirani", , "[email protected]", role = c("aut", "cre")),
person("Lionel", "Henry", role = "ctb", comment = "Author of included rlang fragments"),
person("Hadley", "Wickham", role = "ctb", comment = "Author of included rlang fragments"),
person("Posit", role = "cph", comment = "Copyright holder of included rlang fragments")
)
Description: This package introduces a common data structure for epidemiological
data reported by location and time, provides another data structure to
work with revisions to these data sets over time, and offers associated
utilities to perform basic signal processing tasks.
License: MIT + file LICENSE
Copyright: file inst/COPYRIGHTS
Imports:
cli,
data.table,
Expand Down
3 changes: 0 additions & 3 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -96,19 +96,16 @@ importFrom(rlang,.env)
importFrom(rlang,arg_match)
importFrom(rlang,caller_arg)
importFrom(rlang,caller_env)
importFrom(rlang,check_dots_empty0)
importFrom(rlang,enquo)
importFrom(rlang,enquos)
importFrom(rlang,env)
importFrom(rlang,f_env)
importFrom(rlang,f_rhs)
importFrom(rlang,global_env)
importFrom(rlang,is_environment)
importFrom(rlang,is_formula)
importFrom(rlang,is_function)
importFrom(rlang,is_missing)
importFrom(rlang,is_quosure)
importFrom(rlang,is_string)
importFrom(rlang,missing_arg)
importFrom(rlang,new_function)
importFrom(rlang,quo_is_missing)
Expand Down
204 changes: 62 additions & 142 deletions R/grouped_epi_archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,8 @@ grouped_epi_archive =
#' object. See the documentation for the wrapper function [`epix_slide()`] for
#' details.
#' @importFrom data.table key address
#' @importFrom rlang !! !!! enquo quo_is_missing enquos is_quosure sym syms env
#' @importFrom rlang !! !!! enquo quo_is_missing enquos is_quosure sym syms
#' env missing_arg
slide = function(f, ..., before, ref_time_values,
time_step, new_col_name = "slide_value",
as_list_col = FALSE, names_sep = "_",
Expand Down Expand Up @@ -229,11 +230,6 @@ grouped_epi_archive =
# implementation doesn't take advantage of it.
ref_time_values = sort(ref_time_values)
}

# Check that `f` takes enough args
if (!missing(f) && is.function(f)) {
assert_sufficient_f_args(f, ..., n_mandatory_f_args = 3L)
}

# Validate and pre-process `before`:
if (missing(before)) {
Expand Down Expand Up @@ -296,71 +292,8 @@ grouped_epi_archive =
!!new_col := .env$comp_value))
}

# If f is not missing, then just go ahead, slide by group
if (!missing(f)) {
if (rlang::is_formula(f)) f = as_slide_computation(f)
x = purrr::map_dfr(ref_time_values, function(ref_time_value) {
# Ungrouped as-of data; `epi_df` if `all_versions` is `FALSE`,
# `epi_archive` if `all_versions` is `TRUE`:
as_of_raw = private$ungrouped$as_of(ref_time_value, min_time_value = ref_time_value - before, all_versions = all_versions)

# Set:
# * `as_of_df`, the data.frame/tibble/epi_df/etc. that we will
# `group_modify` as the `.data` argument. Might or might not
# include version column.
# * `group_modify_fn`, the corresponding `.f` argument
if (!all_versions) {
as_of_df = as_of_raw
group_modify_fn = comp_one_grp
} else {
as_of_archive = as_of_raw
# We essentially want to `group_modify` the archive, but
# haven't implemented this method yet. Next best would be
# `group_modify` on its `$DT`, but that has different
# behavior based on whether or not `dtplyr` is loaded.
# Instead, go through an ordinary data frame, trying to avoid
# copies.
if (address(as_of_archive$DT) == address(private$ungrouped$DT)) {
# `as_of` aliased its the full `$DT`; copy before mutating:
as_of_archive$DT <- copy(as_of_archive$DT)
}
dt_key = data.table::key(as_of_archive$DT)
as_of_df = as_of_archive$DT
data.table::setDF(as_of_df)

# Convert each subgroup chunk to an archive before running the calculation.
group_modify_fn = function(.data_group, .group_key,
f, ...,
ref_time_value,
new_col) {
# .data_group is coming from as_of_df as a tibble, but we
# want to feed `comp_one_grp` an `epi_archive` backed by a
# DT; convert and wrap:
data.table::setattr(.data_group, "sorted", dt_key)
data.table::setDT(.data_group, key=dt_key)
.data_group_archive = as_of_archive$clone()
.data_group_archive$DT = .data_group
comp_one_grp(.data_group_archive, .group_key, f = f, ...,
ref_time_value = ref_time_value,
new_col = new_col
)
}
}

return(
dplyr::group_by(as_of_df, dplyr::across(tidyselect::all_of(private$vars)),
.drop=private$drop) %>%
dplyr::group_modify(group_modify_fn,
f = f, ...,
ref_time_value = ref_time_value,
new_col = new_col,
.keep = TRUE)
)
})
}

# Else interpret ... as an expression for tidy evaluation
else {
# If `f` is missing, interpret ... as an expression for tidy evaluation
if (missing(f)) {
quos = enquos(...)
if (length(quos) == 0) {
Abort("If `f` is missing then a computation must be specified via `...`.")
Expand All @@ -369,83 +302,70 @@ grouped_epi_archive =
Abort("If `f` is missing then only a single computation can be specified via `...`.")
}

quo = quos[[1]]
f = function(.x, .group_key, .ref_time_value, quo, ...) {
# Convert to environment to standardize between tibble and R6
# based inputs. In both cases, we should get a simple
# environment with the empty environment as its parent.
data_env = rlang::as_environment(.x)
data_mask = rlang::new_data_mask(bottom = data_env, top = data_env)
data_mask$.data <- rlang::as_data_pronoun(data_mask)
# We'll also install `.x` directly, not as an
# `rlang_data_pronoun`, so that we can, e.g., use more dplyr and
# epiprocess operations.
data_mask$.x = .x
data_mask$.group_key = .group_key
data_mask$.ref_time_value = .ref_time_value
rlang::eval_tidy(quo, data_mask)
}
f = quos[[1]]
new_col = sym(names(rlang::quos_auto_name(quos)))
... = missing_arg() # magic value that passes zero args as dots in calls below
}

x = purrr::map_dfr(ref_time_values, function(ref_time_value) {
# Ungrouped as-of data; `epi_df` if `all_versions` is `FALSE`,
# `epi_archive` if `all_versions` is `TRUE`:
as_of_raw = private$ungrouped$as_of(ref_time_value, min_time_value = ref_time_value - before, all_versions = all_versions)
f = as_slide_computation(f, ...)
x = purrr::map_dfr(ref_time_values, function(ref_time_value) {
# Ungrouped as-of data; `epi_df` if `all_versions` is `FALSE`,
# `epi_archive` if `all_versions` is `TRUE`:
as_of_raw = private$ungrouped$as_of(ref_time_value, min_time_value = ref_time_value - before, all_versions = all_versions)

# Set:
# * `as_of_df`, the data.frame/tibble/epi_df/etc. that we will
# `group_modify` as the `.data` argument. Might or might not
# include version column.
# * `group_modify_fn`, the corresponding `.f` argument
if (!all_versions) {
as_of_df = as_of_raw
group_modify_fn = comp_one_grp
} else {
as_of_archive = as_of_raw
# We essentially want to `group_modify` the archive, but don't
# provide an implementation yet. Next best would be
# `group_modify` on its `$DT`, but that has different behavior
# based on whether or not `dtplyr` is loaded. Instead, go
# through an ordinary data frame, trying to avoid copies.
if (address(as_of_archive$DT) == address(private$ungrouped$DT)) {
# `as_of` aliased its the full `$DT`; copy before mutating:
as_of_archive$DT <- copy(as_of_archive$DT)
}
dt_key = data.table::key(as_of_archive$DT)
as_of_df = as_of_archive$DT
data.table::setDF(as_of_df)
# Set:
# * `as_of_df`, the data.frame/tibble/epi_df/etc. that we will
# `group_modify` as the `.data` argument. Might or might not
# include version column.
# * `group_modify_fn`, the corresponding `.f` argument
if (!all_versions) {
as_of_df = as_of_raw
group_modify_fn = comp_one_grp
} else {
as_of_archive = as_of_raw
# We essentially want to `group_modify` the archive, but
# haven't implemented this method yet. Next best would be
# `group_modify` on its `$DT`, but that has different
# behavior based on whether or not `dtplyr` is loaded.
# Instead, go through an ordinary data frame, trying to avoid
# copies.
if (address(as_of_archive$DT) == address(private$ungrouped$DT)) {
# `as_of` aliased its the full `$DT`; copy before mutating:
as_of_archive$DT <- copy(as_of_archive$DT)
}
dt_key = data.table::key(as_of_archive$DT)
as_of_df = as_of_archive$DT
data.table::setDF(as_of_df)

# Convert each subgroup chunk to an archive before running the calculation.
group_modify_fn = function(.data_group, .group_key,
f, ...,
ref_time_value,
new_col) {
# .data_group is coming from as_of_df as a tibble, but we
# want to feed `comp_one_grp` an `epi_archive` backed by a
# DT; convert and wrap:
data.table::setattr(.data_group, "sorted", dt_key)
data.table::setDT(.data_group, key=dt_key)
.data_group_archive = as_of_archive$clone()
.data_group_archive$DT = .data_group
comp_one_grp(.data_group_archive, .group_key, f = f, quo = quo,
ref_time_value = ref_time_value,
new_col = new_col
)
}
# Convert each subgroup chunk to an archive before running the calculation.
group_modify_fn = function(.data_group, .group_key,
f, ...,
ref_time_value,
new_col) {
# .data_group is coming from as_of_df as a tibble, but we
# want to feed `comp_one_grp` an `epi_archive` backed by a
# DT; convert and wrap:
data.table::setattr(.data_group, "sorted", dt_key)
data.table::setDT(.data_group, key=dt_key)
.data_group_archive = as_of_archive$clone()
.data_group_archive$DT = .data_group
comp_one_grp(.data_group_archive, .group_key, f = f, ...,
ref_time_value = ref_time_value,
new_col = new_col
)
}
}

return(
dplyr::group_by(as_of_df, dplyr::across(tidyselect::all_of(private$vars)),
.drop=private$drop) %>%
dplyr::group_modify(group_modify_fn,
f = f, quo = quo,
ref_time_value = ref_time_value,
comp_effective_key_vars = comp_effective_key_vars,
new_col = new_col,
.keep = TRUE)
)
})
}
return(
dplyr::group_by(as_of_df, dplyr::across(tidyselect::all_of(private$vars)),
.drop=private$drop) %>%
dplyr::group_modify(group_modify_fn,
f = f, ...,
ref_time_value = ref_time_value,
new_col = new_col,
.keep = TRUE)
)
})

# Unchop/unnest if we need to
if (!as_list_col) {
Expand Down
75 changes: 24 additions & 51 deletions R/slide.R
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@
#'
#' @importFrom lubridate days weeks
#' @importFrom dplyr bind_rows group_vars filter select
#' @importFrom rlang .data .env !! enquo enquos sym env
#' @importFrom rlang .data .env !! enquo enquos sym env missing_arg
#' @export
#' @examples
#' # slide a 7-day trailing average formula on cases
Expand Down Expand Up @@ -167,11 +167,6 @@ epi_slide = function(x, f, ..., before, after, ref_time_values,

# Check we have an `epi_df` object
if (!inherits(x, "epi_df")) Abort("`x` must be of class `epi_df`.")

# Check that `f` takes enough args
if (!missing(f) && is.function(f)) {
assert_sufficient_f_args(f, ..., n_mandatory_f_args = 3L)
}

if (missing(ref_time_values)) {
ref_time_values = unique(x$time_value)
Expand Down Expand Up @@ -356,28 +351,8 @@ epi_slide = function(x, f, ..., before, after, ref_time_values,
return(mutate(.data_group, !!new_col := slide_values))
}

# If f is not missing, then just go ahead, slide by group
if (!missing(f)) {
if (rlang::is_formula(f)) f = as_slide_computation(f)
f_rtv_wrapper = function(x, g, ...) {
ref_time_value = min(x$time_value) + before
x <- x[x$.real,]
x$.real <- NULL
f(x, g, ref_time_value, ...)
}
x = x %>%
group_modify(slide_one_grp,
f = f_rtv_wrapper, ...,
starts = starts,
stops = stops,
time_values = ref_time_values,
all_rows = all_rows,
new_col = new_col,
.keep = FALSE)
}

# Else interpret ... as an expression for tidy evaluation
else {
# If `f` is missing, interpret ... as an expression for tidy evaluation
if (missing(f)) {
quos = enquos(...)
if (length(quos) == 0) {
Abort("If `f` is missing then a computation must be specified via `...`.")
Expand All @@ -386,31 +361,29 @@ epi_slide = function(x, f, ..., before, after, ref_time_values,
Abort("If `f` is missing then only a single computation can be specified via `...`.")
}

quo = quos[[1]]
f = function(.x, .group_key, quo, ...) {
.ref_time_value = min(.x$time_value) + before
.x <- .x[.x$.real,]
.x$.real <- NULL
data_mask = rlang::as_data_mask(.x)
# We'll also install `.x` directly, not as an `rlang_data_pronoun`, so
# that we can, e.g., use more dplyr and epiprocess operations.
data_mask$.x = .x
data_mask$.group_key = .group_key
data_mask$.ref_time_value = .ref_time_value
rlang::eval_tidy(quo, data_mask)
}
f = quos[[1]]
new_col = sym(names(rlang::quos_auto_name(quos)))

x = x %>%
group_modify(slide_one_grp,
f = f, quo = quo,
starts = starts,
stops = stops,
time_values = ref_time_values,
all_rows = all_rows,
new_col = new_col,
.keep = FALSE)
... = missing_arg() # magic value that passes zero args as dots in calls below
}

f = as_slide_computation(f, ...)
# Create a wrapper that calculates and passes `.ref_time_value` to the
# computation.
f_wrapper = function(.x, .group_key, ...) {
.ref_time_value = min(.x$time_value) + before
.x <- .x[.x$.real,]
.x$.real <- NULL
f(.x, .group_key, .ref_time_value, ...)
}
x = x %>%
group_modify(slide_one_grp,
f = f_wrapper, ...,
starts = starts,
stops = stops,
time_values = ref_time_values,
all_rows = all_rows,
new_col = new_col,
.keep = FALSE)

# Unnest if we need to, and return
if (!as_list_col) {
Expand Down
Loading
Loading