Skip to content

Commit

Permalink
refactor(time_types): refactor time types
Browse files Browse the repository at this point in the history
* guess time_type of "day", "week", "yearmonth", "integer" from time column and warn if not compatible
* restrict `before` and `after` to types compatible with time column
* deprecate geo_type and time_type constructor arguments, infer only
* improve documentation on geo and time types
* enforce time_value and version being same type in epi_archive
* update vignettes
* move arg validation from new_epi_df to as_epi_df.tbl_df to match epi_archive

a
  • Loading branch information
dshemetov committed Jul 20, 2024
1 parent f79821e commit 7a50d9d
Show file tree
Hide file tree
Showing 39 changed files with 1,326 additions and 1,797 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ importFrom(checkmate,check_atomic)
importFrom(checkmate,check_data_frame)
importFrom(checkmate,check_names)
importFrom(checkmate,expect_class)
importFrom(checkmate,test_int)
importFrom(checkmate,test_set_equal)
importFrom(checkmate,test_subset)
importFrom(checkmate,vname)
Expand Down
19 changes: 14 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat
# epiprocess 0.8

## Breaking changes

- `detect_outlr_stl(seasonal_period = NULL)` is no longer accepted. Use
`detect_outlr_stl(seasonal_period = <value>, seasonal_as_residual = TRUE)`
instead. See `?detect_outlr_stl` for more details.
instead. See `?detect_outlr_stl` for more details.

## Improvements

Expand Down Expand Up @@ -49,15 +50,23 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat
output a huge number of `ref_time_values` spaced apart by mere seconds.

## Cleanup
- Resolved some linting messages in package checks (#468).

## Cleanup
- Resolved some linting messages in package checks (#468).
- Added optional `decay_to_tibble` attribute controlling `as_tibble()` behavior
of `epi_df`s to let `{epipredict}` work more easily with other libraries (#471).

## Cleanup
- Removed some external package dependencies.

## Breaking Changes

- `epi_df`'s are now more strict about what types they allow in the time column.
Namely, we are explicit about only supporting `Date` at the daily and weekly
cadence and generic integer types (for yearly cadence).
- `epi_slide` `before` and `after` arguments are now require the user to
specific time units in certain cases. The `time_step` argument has been
removed.
- `epix_slide` `before` argument now defaults to `Inf`, and requires the user to
specify units in some cases. The `time_step` argument has been removed.

# epiprocess 0.7.0

## Breaking changes:
Expand Down
114 changes: 54 additions & 60 deletions R/archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -170,11 +170,8 @@ NULL
#' The data table `DT` has key variables `geo_value`, `time_value`, `version`,
#' as well as any others (these can be specified when instantiating the
#' `epi_archive` object via the `other_keys` argument, and/or set by operating
#' on `DT` directly). Refer to the documentation for `as_epi_archive()` for
#' information and examples of relevant parameter names for an `epi_archive`
#' object. Note that there can only be a single row per unique combination of
#' key variables, and thus the key variables are critical for figuring out how
#' to generate a snapshot of data from the archive, as of a given version.
#' on `DT` directly). Note that there can only be a single row per unique
#' combination of key variables.
#'
#' @section Metadata:
#' The following pieces of metadata are included as fields in an `epi_archive`
Expand All @@ -184,18 +181,15 @@ NULL
#' * `time_type`: the type for the time values.
#' * `additional_metadata`: list of additional metadata for the data archive.
#'
#' Unlike an `epi_df` object, metadata for an `epi_archive` object `x` can be
#' accessed (and altered) directly, as in `x$geo_type` or `x$time_type`,
#' etc. Like an `epi_df` object, the `geo_type` and `time_type` fields in the
#' metadata of an `epi_archive` object are not currently used by any
#' downstream functions in the `epiprocess` package, and serve only as useful
#' bits of information to convey about the data set at hand.
#' While this metadata is not protected, it is generally recommended to treat it
#' as read-only, and to use the `epi_archive` methods to interact with the data
#' archive. Unexpected behavior may result from modifying the metadata
#' directly.
#'
#' @section Generating Snapshots:
#' An `epi_archive` object can be used to generate a snapshot of the data in
#' `epi_df` format, which represents the most up-to-date values of the signal
#' variables, as of the specified version. This is accomplished by calling
#' `epix_as_of()`.
#' `epi_df` format, which represents the most up-to-date time series values up
#' to a point in time. This is accomplished by calling `epix_as_of()`.
#'
#' @section Sliding Computations:
#' We can run a sliding computation over an `epi_archive` object, much like
Expand All @@ -208,19 +202,18 @@ NULL
#'
#' @param x A data.frame, data.table, or tibble, with columns `geo_value`,
#' `time_value`, `version`, and then any additional number of columns.
#' @param geo_type Type for the geo values. If missing, then the function will
#' attempt to infer it from the geo values present; if this fails, then it
#' will be set to "custom".
#' @param time_type Type for the time values. If missing, then the function will
#' attempt to infer it from the time values present; if this fails, then it
#' will be set to "custom".
#' @param geo_type DEPRECATED Has no effect. Geo value type is inferred from the
#' location column and set to "custom" if not recognized.
#' @param time_type DEPRECATED Has no effect. Time value type inferred from the time
#' column and set to "custom" if not recognized. Unpredictable behavior may result
#' if the time type is not recognized.
#' @param other_keys Character vector specifying the names of variables in `x`
#' that should be considered key variables (in the language of `data.table`)
#' apart from "geo_value", "time_value", and "version".
#' @param additional_metadata List of additional metadata to attach to the
#' `epi_archive` object. The metadata will have `geo_type` and `time_type`
#' fields; named entries from the passed list or will be included as well.
#' @param compactify Optional; Boolean or `NULL`. `TRUE` will remove some
#' `epi_archive` object. The metadata will have the `geo_type` field; named
#' entries from the passed list or will be included as well.
#' @param compactify Optional; Boolean. `TRUE` will remove some
#' redundant rows, `FALSE` will not, and missing or `NULL` will remove
#' redundant rows, but issue a warning. See more information at `compactify`.
#' @param clobberable_versions_start Optional; `length`-1; either a value of the
Expand Down Expand Up @@ -269,10 +262,7 @@ NULL
#' value = rnorm(10, mean = 2, sd = 1)
#' )
#'
#' toy_epi_archive <- tib %>% as_epi_archive(
#' geo_type = "state",
#' time_type = "day"
#' )
#' toy_epi_archive <- tib %>% as_epi_archive()
#' toy_epi_archive
#'
#' # Ex. with an additional key for county
Expand All @@ -295,21 +285,17 @@ NULL
#' cases_rate = c(0.01, 0.02, 0.01, 0.05)
#' )
#'
#' x <- df %>% as_epi_archive(
#' geo_type = "state",
#' time_type = "day",
#' other_keys = "county"
#' )
#' x <- df %>% as_epi_archive(other_keys = "county")
#'
new_epi_archive <- function(
x,
geo_type = NULL,
time_type = NULL,
other_keys = NULL,
additional_metadata = NULL,
compactify = NULL,
clobberable_versions_start = NULL,
versions_end = NULL) {
geo_type,
time_type,
other_keys,
additional_metadata,
compactify,
clobberable_versions_start,
versions_end) {
# Create the data table; if x was an un-keyed data.table itself,
# then the call to as.data.table() will fail to set keys, so we
# need to check this, then do it manually if needed
Expand Down Expand Up @@ -398,13 +384,11 @@ new_epi_archive <- function(
#' @export
validate_epi_archive <- function(
x,
geo_type = NULL,
time_type = NULL,
other_keys = NULL,
additional_metadata = NULL,
compactify = NULL,
clobberable_versions_start = NULL,
versions_end = NULL) {
other_keys,
additional_metadata,
compactify,
clobberable_versions_start,
versions_end) {
# Finish off with small checks on keys variables and metadata
if (!test_subset(other_keys, names(x))) {
cli_abort("`other_keys` must be contained in the column names of `x`.")
Expand All @@ -413,12 +397,20 @@ validate_epi_archive <- function(
cli_abort("`other_keys` cannot contain \"geo_value\", \"time_value\", or \"version\".")
}
if (any(names(additional_metadata) %in% c("geo_type", "time_type"))) {
cli_warn("`additional_metadata` names overlap with existing metadata fields \"geo_type\", \"time_type\".")
cli_warn("`additional_metadata` names overlap with existing metadata fields \"geo_type\" or \"time_type\".")
}

# Conduct checks and apply defaults for `compactify`
assert_logical(compactify, len = 1, any.missing = FALSE, null.ok = TRUE)

# Make sure `time_value` and `version` have the same time type
if (!identical(class(x[["time_value"]]), class(x[["version"]]))) {
cli_abort(
"`time_value` and `version` must have the same class.",
class = "epiprocess__time_value_version_mismatch"
)
}

# Apply defaults and conduct checks for
# `clobberable_versions_start`, `versions_end`:
validate_version_bound(clobberable_versions_start, x, na_ok = TRUE)
Expand Down Expand Up @@ -453,13 +445,13 @@ validate_epi_archive <- function(
#' @export
as_epi_archive <- function(
x,
geo_type = NULL,
time_type = NULL,
other_keys = NULL,
additional_metadata = NULL,
geo_type = deprecated(),
time_type = deprecated(),
other_keys = character(0L),
additional_metadata = list(),
compactify = NULL,
clobberable_versions_start = NULL,
.versions_end = NULL, ...,
clobberable_versions_start = NA,
.versions_end = max_version_with_row_in(x), ...,
versions_end = .versions_end) {
assert_data_frame(x)
x <- rename(x, ...)
Expand All @@ -477,16 +469,18 @@ as_epi_archive <- function(
if (anyMissing(x$version)) {
cli_abort("Column `version` must not contain missing values.")
}
if (lifecycle::is_present(geo_type)) {
cli_warn("epi_archive constructor argument `geo_type` is now ignored. Consider removing.")
}
if (lifecycle::is_present(time_type)) {
cli_warn("epi_archive constructor argument `time_type` is now ignored. Consider removing.")
}

geo_type <- geo_type %||% guess_geo_type(x$geo_value)
time_type <- time_type %||% guess_time_type(x$time_value)
other_keys <- other_keys %||% character(0L)
additional_metadata <- additional_metadata %||% list()
clobberable_versions_start <- clobberable_versions_start %||% NA
versions_end <- versions_end %||% max_version_with_row_in(x)
geo_type <- guess_geo_type(x$geo_value)
time_type <- guess_time_type(x$time_value)

validate_epi_archive(
x, geo_type, time_type, other_keys, additional_metadata,
x, other_keys, additional_metadata,
compactify, clobberable_versions_start, versions_end
)
new_epi_archive(
Expand Down
Loading

0 comments on commit 7a50d9d

Please sign in to comment.