Skip to content

Commit

Permalink
Merge pull request #315 from cmu-delphi/djm/remove-fabletools
Browse files Browse the repository at this point in the history
Djm/remove fabletools
  • Loading branch information
brookslogan authored Jul 18, 2024
2 parents 5fb62f3 + 9524643 commit 8f25ec9
Show file tree
Hide file tree
Showing 6 changed files with 95 additions and 50 deletions.
3 changes: 0 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,6 @@ Imports:
cli,
data.table,
dplyr (>= 1.0.0),
fabletools,
feasts,
generics,
genlasso,
ggplot2,
lifecycle (>= 1.0.1),
Expand Down
8 changes: 8 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat

# epiprocess 0.8

## Breaking changes
- `detect_outlr_stl(seasonal_period = NULL)` is no longer accepted. Use
`detect_outlr_stl(seasonal_period = <value>, seasonal_as_residual = TRUE)`
instead. See `?detect_outlr_stl` for more details.

## Improvements

- `epi_slide` computations are now 2-4 times faster after changing how
Expand Down Expand Up @@ -43,6 +48,9 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat
- Added optional `decay_to_tibble` attribute controlling `as_tibble()` behavior
of `epi_df`s to let `{epipredict}` work more easily with other libraries (#471).

## Cleanup
- Removed some external package dependencies.

# epiprocess 0.7.0

## Breaking changes:
Expand Down
93 changes: 59 additions & 34 deletions R/outliers.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,10 @@
#' args = list(list(
#' detect_negatives = TRUE,
#' detection_multiplier = 2.5,
#' seasonal_period = NULL
#' seasonal_period = 7,
#' seasonal_as_residual = TRUE
#' )),
#' abbr = "stl_nonseasonal"
#' abbr = "stl_reseasonal"
#' )
#' )
#'
Expand Down Expand Up @@ -216,18 +217,28 @@ detect_outlr_rm <- function(x = seq_along(y), y, n = 21,
#' @param n_trend Number of time steps to use in the rolling window for trend.
#' Default is 21.
#' @param n_seasonal Number of time steps to use in the rolling window for
#' seasonality. Default is 21.
#' seasonality. Default is 21. Can also be the string "periodic". See
#' `s.window` in [`stats::stl`].
#' @param n_threshold Number of time steps to use in rolling window for the IQR
#' outlier thresholds.
#' @param seasonal_period Integer specifying period of seasonality. For example,
#' for daily data, a period 7 means weekly seasonality. The default is `NULL`,
#' meaning that no seasonal term will be included in the STL decomposition.
#' @param seasonal_period Integer specifying period of "seasonality". For
#' example, for daily data, a period 7 means weekly seasonality. It must be
#' strictly larger than 1. Also impacts the size of the low-pass filter
#' window; see `l.window` in [`stats::stl`].
#' @param seasonal_as_residual Boolean specifying whether the seasonal(/weekly)
#' component should be treated as part of the residual component instead of as
#' part of the predictions. The default, FALSE, treats them as part of the
#' predictions, so large seasonal(/weekly) components will not lead to
#' flagging points as outliers. `TRUE` may instead consider the extrema of
#' large seasonal variations to be outliers; `n_seasonal` and
#' `seasonal_period` will still have an impact on the result, though, by
#' impacting the estimation of the trend component.
#' @template outlier-detection-options
#' @template detect-outlr-return
#'
#' @details The STL decomposition is computed using the `feasts` package. Once
#' @details The STL decomposition is computed using [`stats::stl()`]. Once
#' computed, the outlier detection method is analogous to the rolling median
#' method in `detect_outlr_rm()`, except with the fitted values and residuals
#' method in [`detect_outlr_rm()`], except with the fitted values and residuals
#' from the STL decomposition taking the place of the rolling median and
#' residuals to the rolling median, respectively.
#'
Expand All @@ -252,12 +263,34 @@ detect_outlr_stl <- function(x = seq_along(y), y,
n_trend = 21,
n_seasonal = 21,
n_threshold = 21,
seasonal_period = NULL,
seasonal_period,
seasonal_as_residual = FALSE,
log_transform = FALSE,
detect_negatives = FALSE,
detection_multiplier = 2,
min_radius = 0,
replacement_multiplier = 0) {
if (dplyr::n_distinct(x) != length(y)) {
cli_abort("`x` contains duplicate values. (If being run on a column in an
`epi_df`, did you group by relevant key variables?)")
}
if (length(y) <= 1L) {
cli_abort("`y` has length {length(y)}; that's definitely too little for
STL. (If being run in a `mutate()` or `epi_slide()`, check
whether you grouped by too many variables; you should not be
grouping by `time_value` in particular.)")
}
distinct_x_skips <- unique(diff(x))
if (diff(range(distinct_x_skips)) > 1e-4 * mean(distinct_x_skips)) {
cli_abort("`x` does not appear to have regular spacing; consider filling in
gaps with imputed values (STL does not allow NAs).")
}
if (is.unsorted(x)) { # <- for performance in common (sorted) case
o <- order(x)
x <- x[o]
y <- y[o]
}

# Transform if requested
if (log_transform) {
# Replace all negative values with 0
Expand All @@ -266,32 +299,22 @@ detect_outlr_stl <- function(x = seq_along(y), y,
y <- log(y + offset)
}

# Make a tsibble for fabletools, setup and run STL
z_tsibble <- tsibble::tsibble(x = x, y = y, index = x)

stl_formula <- y ~ trend(window = n_trend) +
season(period = seasonal_period, window = n_seasonal)
assert_int(seasonal_period, lower = 2L)
assert_logical(seasonal_as_residual, len = 1L, any.missing = FALSE)

stl_components <- z_tsibble %>%
fabletools::model(feasts::STL(stl_formula, robust = TRUE)) %>%
generics::components() %>%
yts <- stats::ts(y, frequency = seasonal_period)
stl_comp <- stats::stl(yts,
t.window = n_trend, s.window = n_seasonal,
robust = TRUE
)$time.series %>%
tibble::as_tibble() %>%
dplyr::select(.data$trend:.data$remainder) %>% #
dplyr::rename_with(~"seasonal", tidyselect::starts_with("season")) %>%
dplyr::rename(resid = .data$remainder)

# Allocate the seasonal term from STL to either fitted or resid
if (!is.null(seasonal_period)) {
stl_components <- stl_components %>%
dplyr::mutate(
fitted = .data$trend + .data$seasonal
)
if (!seasonal_as_residual) {
stl_comp <- dplyr::mutate(stl_comp, fitted = .data$trend + .data$seasonal)
} else {
stl_components <- stl_components %>%
dplyr::mutate(
fitted = .data$trend,
resid = .data$seasonal + resid
)
stl_comp <- dplyr::mutate(stl_comp, fitted = .data$trend, resid = .data$seasonal + .data$resid)
}

# Detect negatives if requested
Expand All @@ -306,10 +329,7 @@ detect_outlr_stl <- function(x = seq_along(y), y,

# Calculate lower and upper thresholds and replacement value
z <- z %>%
dplyr::mutate(
fitted = stl_components$fitted,
resid = stl_components$resid
) %>%
dplyr::mutate(fitted = stl_comp$fitted, resid = stl_comp$resid) %>%
roll_iqr(
n = n_threshold,
detection_multiplier = detection_multiplier,
Expand Down Expand Up @@ -337,7 +357,12 @@ roll_iqr <- function(z, n, detection_multiplier, min_radius,
as_type <- as.numeric
}

epi_slide(z, roll_iqr = stats::IQR(resid), before = floor((n - 1) / 2), after = ceiling((n - 1) / 2)) %>%
z %>%
epi_slide(
roll_iqr = stats::IQR(resid),
before = floor((n - 1) / 2),
after = ceiling((n - 1) / 2)
) %>%
dplyr::mutate(
lower = pmax(
min_lower,
Expand Down
5 changes: 3 additions & 2 deletions man/detect_outlr.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 19 additions & 7 deletions man/detect_outlr_stl.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 6 additions & 4 deletions vignettes/outliers.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,9 @@ methods.
2. Detection based on a seasonal-trend decomposition using LOESS (STL), using
`detect_outlr_stl()`, which is similar to the rolling median method but
replaces the rolling median with fitted values from STL.
3. Detection based on an STL decomposition, but without seasonality term, which
amounts to smoothing using LOESS.
3. Detection based on an STL decomposition, but subtracting out the seasonality
term from its predictions, which may result in the extrema of large seasonal
variations being considered as outliers.

The outlier detection methods are specified using a `tibble` that is passed to
`detect_outlr()`, with one row per method, and whose columms specify the
Expand Down Expand Up @@ -108,9 +109,10 @@ detection_methods <- bind_rows(
args = list(list(
detect_negatives = TRUE,
detection_multiplier = 2.5,
seasonal_period = NULL
seasonal_period = 7,
seasonal_as_residual = TRUE
)),
abbr = "stl_nonseasonal"
abbr = "stl_reseasonal"
)
)
Expand Down

0 comments on commit 8f25ec9

Please sign in to comment.