Skip to content

Commit

Permalink
wip doc: lots of improvements to the slide reference
Browse files Browse the repository at this point in the history
  • Loading branch information
dshemetov committed Oct 15, 2024
1 parent 68ff0ab commit dfeb3f0
Show file tree
Hide file tree
Showing 36 changed files with 1,888 additions and 1,320 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ sandbox.R
*_cache/
vignettes/*.html
vignettes/*.R
!vignettes/_common.R
19 changes: 11 additions & 8 deletions R/archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@

#' Validate a version bound arg
#'
#' Expected to be used on `clobberable_versions_start`, `versions_end`,
#' and similar arguments. Some additional context-specific checks may be needed.
#' Expected to be used on `clobberable_versions_start`, `versions_end`, and
#' similar arguments. Some additional context-specific checks may be needed.
#' Side effects: raises an error if version bound appears invalid.
#'
#' @param version_bound the version bound to validate
#' @param x a data frame containing a version column with which to check
Expand All @@ -20,8 +21,6 @@
#' @param version_bound_arg optional string; what to call the version bound in
#' error messages
#'
#' @section Side effects: raises an error if version bound appears invalid
#'
#' @keywords internal
validate_version_bound <- function(version_bound, x, na_ok = FALSE,
version_bound_arg = rlang::caller_arg(version_bound),
Expand Down Expand Up @@ -147,7 +146,8 @@ next_after.Date <- function(x) x + 1L
#' on `DT` directly). Note that there can only be a single row per unique
#' combination of key variables.
#'
#' @section Compactify:
#' ## Compactify
#'
#' This section describes the internals of how compactification works in an
#' `epi_archive()`. Compactification can potentially improve code speed or
#' memory usage, depending on your data.
Expand All @@ -169,7 +169,8 @@ next_after.Date <- function(x) x + 1L
#' version in which it was first released, or if no version of that
#' observation appears in the archive data at all.
#'
#' @section Metadata:
#' ## Metadata
#'
#' The following pieces of metadata are included as fields in an `epi_archive`
#' object:
#'
Expand All @@ -183,12 +184,14 @@ next_after.Date <- function(x) x + 1L
#' archive. Unexpected behavior may result from modifying the metadata
#' directly.
#'
#' @section Generating Snapshots:
#' ## Generating Snapshots
#'
#' An `epi_archive` object can be used to generate a snapshot of the data in
#' `epi_df` format, which represents the most up-to-date time series values up
#' to a point in time. This is accomplished by calling `epix_as_of()`.
#'
#' @section Sliding Computations:
#' ## Sliding Computations
#'
#' We can run a sliding computation over an `epi_archive` object, much like
#' `epi_slide()` does for an `epi_df` object. This is accomplished by calling
#' the `slide()` method for an `epi_archive` object, which works similarly to
Expand Down
6 changes: 4 additions & 2 deletions R/epi_df.R
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@
#' generate `epi_df` objects, as data snapshots, from an `epi_archive`
#' object).
#'
#' @section Geo Types:
#' ## Geo Types
#'
#' The following geo types are recognized in an `epi_df`.
#'
#' * `"county"`: each observation corresponds to a U.S. county; coded by 5-digit
Expand All @@ -69,7 +70,8 @@
#'
#' An unrecognizable geo type is labeled "custom".
#'
#' @section Time Types:
#' ## Time Types
#'
#' The following time types are recognized in an `epi_df`.
#'
#' * `"day"`: each observation corresponds to a day; coded as a `Date` object,
Expand Down
9 changes: 6 additions & 3 deletions R/growth_rate.R
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@
#' `genlasso::trendfilter()`, divided by the fitted value of the discrete
#' spline at `x0`.
#'
#' @section Log Scale:
#' ## Log Scale
#'
#' An alternative view for the growth rate of a function f in general is given
#' by defining g(t) = log(f(t)), and then observing that g'(t) = f'(t) /
#' f(t). Therefore, any method that estimates the derivative can be simply
Expand All @@ -65,7 +66,8 @@
#' "trend_filter") has a log scale analog, which can be used by setting
#' `log_scale = TRUE`.
#'
#' @section Sliding Windows:
#' ## Sliding Windows
#'
#' For the local methods, "rel_change" and "linear_reg", we use a sliding window
#' centered at the reference point of bandiwidth `h`. In other words, the
#' sliding window consists of all points in `x` whose distance to the
Expand All @@ -75,7 +77,8 @@
#' sliding window contains all data in between January 1 and 14 (matching the
#' behavior of `epi_slide()` with `before = h - 1` and `after = h`).
#'
#' @section Additional Arguments:
#' ## Additional Arguments
#'
#' For the global methods, "smooth_spline" and "trend_filter", additional
#' arguments can be specified via `...` for the underlying estimation
#' function. For the smoothing spline case, these additional arguments are
Expand Down
13 changes: 5 additions & 8 deletions R/methods-epi_df.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,8 @@
#' `as_tibble()` on `epi_df`s but you actually want them to remain `epi_df`s,
#' use `attr(your_epi_df, "decay_to_tibble") <- FALSE` beforehand.
#'
#' @template x
#'
#' @param x an `epi_df`
#' @inheritParams tibble::as_tibble
#'
#' @importFrom tibble as_tibble
#' @export
as_tibble.epi_df <- function(x, ...) {
Expand All @@ -34,7 +32,7 @@ as_tibble.epi_df <- function(x, ...) {
#' others in the `other_keys` field of the metadata, or else explicitly set.
#'
#' @method as_tsibble epi_df
#' @template x
#' @param x an `epi_df`
#' @param key Optional. Any additional keys (other than `geo_value`) to add to
#' the `tsibble`.
#' @param ... additional arguments passed on to `tsibble::as_tsibble()`
Expand All @@ -54,8 +52,7 @@ as_tsibble.epi_df <- function(x, key, ...) {
#'
#' Print and summary functions for an `epi_df` object.
#'
#' @template x
#'
#' @param x an `epi_df`
#' @method print epi_df
#' @param ... additional arguments to forward to `NextMethod()`, or unused
#' @export
Expand Down Expand Up @@ -261,9 +258,9 @@ group_modify.epi_df <- function(.data, .f, ..., .keep = FALSE) {

#' Complete epi_df
#'
#' A tidyr::complete() analogue for epi_df objects. This function
#' A `tidyr::complete()` analogue for `epi_df`` objects. This function
#' can be used, for example, to add rows for missing combinations
#' of geo_value and time_value, filling other columns with `NA`s.
#' of `geo_value` and `time_value`, filling other columns with `NA`s.
#' See the examples for usage details.
#'
#' @param data an `epi_df`
Expand Down
27 changes: 13 additions & 14 deletions R/outliers.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
#' Detect outliers
#'
#' Applies one or more outlier detection methods to a given signal variable, and
#' @description Applies one or more outlier detection methods to a given signal variable, and
#' optionally aggregates the outputs to create a consensus result. See the
#' [outliers
#' vignette](https://cmu-delphi.github.io/epiprocess/articles/outliers.html) for
#' examples.
#'
#' @template x-y
#' @param x Design points corresponding to the signal values `y`. Default is
#' `seq_along(y)` (that is, equally-spaced points from 1 to the length of
#' `y`).
#' @param y Signal values.
#' @param methods A tibble specifying the method(s) to use for outlier
#' detection, with one row per method, and the following columns:
#' * `method`: Either "rm" or "stl", or a custom function for outlier
Expand All @@ -22,7 +25,9 @@
#' summarized results are calculated. Note that if the number of `methods`
#' (number of rows) is odd, then "median" is equivalent to a majority vote for
#' purposes of determining whether a given observation is an outlier.
#' @template detect-outlr-return
#' @return An tibble with number of rows equal to `length(y)` and columns
#' giving the outlier detection thresholds (`lower` and `upper`) and
#' replacement values from each detection method (`replacement`).
#'
#' @details Each outlier detection method, one per row of the passed `methods`
#' tibble, is a function that must take as its first two arguments `x` and
Expand Down Expand Up @@ -139,19 +144,16 @@ detect_outlr <- function(x = seq_along(y), y,
return(results)
}

#' Detect outliers based on a rolling median
#' @description `detect_outlr_rm` detects outliers based on a distance from the
#' rolling median specified in terms of multiples of the rolling interquartile
#' range (IQR).
#'
#' Detects outliers based on a distance from the rolling median specified in
#' terms of multiples of the rolling interquartile range (IQR).
#'
#' @template x-y
#' @param n Number of time steps to use in the rolling window. Default is 21.
#' This value is centrally aligned. When `n` is an odd number, the rolling
#' window extends from `(n-1)/2` time steps before each design point to `(n-1)/2`
#' time steps after. When `n` is even, then the rolling range extends from
#' `n/2-1` time steps before to `n/2` time steps after.
#' @template outlier-detection-options
#' @template detect-outlr-return
#'
#' @rdname detect_outlr
#' @export
Expand Down Expand Up @@ -210,11 +212,9 @@ detect_outlr_rm <- function(x = seq_along(y), y, n = 21,
return(z)
}

#' Detect outliers based on an STL decomposition
#'
#' Detects outliers based on a seasonal-trend decomposition using LOESS (STL).
#' @description `detect_outlr_stl` detects outliers based on a seasonal-trend
#' decomposition using LOESS (STL).
#'
#' @template x-y
#' @param n_trend Number of time steps to use in the rolling window for trend.
#' Default is 21.
#' @param n_seasonal Number of time steps to use in the rolling window for
Expand All @@ -235,7 +235,6 @@ detect_outlr_rm <- function(x = seq_along(y), y, n = 21,
#' `seasonal_period` will still have an impact on the result, though, by
#' impacting the estimation of the trend component.
#' @template outlier-detection-options
#' @template detect-outlr-return
#'
#' @details The STL decomposition is computed using [`stats::stl()`]. Once
#' computed, the outlier detection method is analogous to the rolling median
Expand Down
Loading

0 comments on commit dfeb3f0

Please sign in to comment.