Merge pull request #299 from epiforecasts/make_check_forecasts_pipeable

scoringutils 1.2.0 release
epiforecasts · Jul 26, 2023 · f74e08f · f74e08f
2 parents 5f8f12a + 55a15df
commit f74e08f
Show file tree

Hide file tree

Showing 17 changed files with 353 additions and 80 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: scoringutils
 Title: Utilities for Scoring and Assessing Predictions
-Version: 1.1.7
+Version: 1.2.0
 Language: en-GB
 Authors@R: c(
     person(given = "Nikos",

diff --git a/NAMESPACE b/NAMESPACE
@@ -103,6 +103,7 @@ importFrom(ggplot2,xlab)
 importFrom(ggplot2,ylab)
 importFrom(lifecycle,deprecated)
 importFrom(methods,hasArg)
+importFrom(methods,is)
 importFrom(rlang,enexprs)
 importFrom(rlang,warn)
 importFrom(scoringRules,crps_sample)

diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,49 @@
+# scoringutils 1.2.0
+
+This major release contains a range of new features and bug fixes that have been introduced in minor releases since `1.1.0`. The most important changes are:
+
+- Documentation updated to reflect changes since version 1.1.0, including new transform and workflow functions.
+- New `set_forecast_unit()` function allows manual setting of forecast unit.
+- `summarise_scores()` gains new `across` argument for summarizing across variables.
+- New `transform_forecasts()` and `log_shift()` functions allow forecast transformations. See the documentation for `transform_forecasts()` for more details and an example use case.
+- Input checks and test coverage improved for bias functions.
+- Bug fix in `get_prediction_type()` for integer matrix input.
+- Links to scoringutils paper and citation updates.
+- Warning added in `interval_score()` for small interval ranges.
+- Linting updates and improvements.
+
+Thanks to @nikosbosse, @seabbs, and @sbfnk for code and review contributions. Thanks to @bisaloo for the suggestion to use a linting GitHub Action that only triggers on changes, and @adrian-lison for the suggestion to add a warning to `interval_score()` if the interval range is between 0 and 1.
+
+## Package updates
+
+- The documentation was updated to reflect the recent changes since `scoringutils 1.1.0`. In particular, usage of the functions `set_forecast_unit()`, `check_forecasts()` and `transform_forecasts()` are now documented in the Vignettes. The introduction of these functions enhances the overall workflow and help to make the code more readable. All functions are designed to be used together with the pipe operator. For example, one can now use something like the following:
+
+```r
+example_quantile |> 
+  set_forecast_unit(c("model", "location", "forecast_date", "horizon", "target_type")) |> 
+  check_forecasts() |> 
+  score()
+```
+
+Documentation for the `transform_forecasts()` has also been extended. This functions allows the user to easily add transformations of forecasts, as suggested in the paper ["Scoring epidemiological forecasts on transformed scales"](https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1). In an epidemiological context, for example, it may make sense to apply the natural logarithm first before scoring forecasts, in order to obtain scores that reflect how well models are able to predict exponential growth rates, rather than absolute values. Users can now do something like the following to score a transformed version of the data in addition to the original one:
+
+```r
+data <- example_quantile[true_value > 0, ]
+data |>
+  transform_forecasts(fun = log_shift, offset = 1) |> 
+  score() |> 
+  summarise_scores(by = c("model", "scale"))
+```
+
+Here we use the `log_shift()` function to apply a logarithmic transformation to the forecasts. This function was introduced in `scoringutils 1.1.2` as a helper function that acts just like `log()`, but has an additional argument `offset` that can add a number to every prediction and observed value before applying the log transformation.
+
+## Feature updates
+
+- Made `check_forecasts()` and `score()` pipeable (see issue #290). This means that
+users can now directly use the output of `check_forecasts()` as input for
+`score()`. As `score()` otherwise runs `check_forecasts()` internally anyway
+this simply makes the step explicit and helps writing clearer code.
+
 # scoringutils 1.1.7
 
 Release by @seabbs in #305. Reviewed by @nikosbosse and @sbfnk.
@@ -21,21 +67,21 @@ Release by @seabbs in #305. Reviewed by @nikosbosse and @sbfnk.
 # scoringutils 1.1.6
 
 ## Feature updates
+
 - Added a new argument, `across`, to `summarise_scores()`. This argument allows the user to summarise scores across different forecast units as an alternative to specifying `by`. See the documentation for `summarise_scores()` for more details and an example use case.
 
 # scoringutils 1.1.5
 
 ## Feature updates
-- Added a new function, `set_forecast_unit()` that allows the user to set the forecast unit manually. The function removes all columns that are not relevant for uniquely identifying a single forecast. If not done manually, `scoringutils` attempts to determine the unit of a single automatically by simply assuming that all column names are
-relevant to determine the forecast unit. This can lead to unexpected behaviour, so setting the forecast unit explicitly can help make the code easier to debug and easier to read (see issue #268). 
-When used as part of a workflow, `set_forecast_unit()` can be directly piped into `check_forecasts()` to
-check everything is in order.
+
+- Added a new function, `set_forecast_unit()` that allows the user to set the forecast unit manually. The function removes all columns that are not relevant for uniquely identifying a single forecast. If not done manually, `scoringutils` attempts to determine the unit of a single automatically by simply assuming that all column names are relevant to determine the forecast unit. This can lead to unexpected behaviour, so setting the forecast unit explicitly can help make the code easier to debug and easier to read (see issue #268). When used as part of a workflow, `set_forecast_unit()` can be directly piped into `check_forecasts()` to check everything is in order.
 
 # scoringutils 1.1.4
 
 ## Package updates
-- added links to the scoringutils paper [Evaluating Forecasts with scoringutils in R](https://arxiv.org/abs/2205.07090) to the package. 
-- updated citation formatting to comply with newer standards. 
+
+- Added links to the scoringutils paper [Evaluating Forecasts with scoringutils in R](https://arxiv.org/abs/2205.07090) to the package.
+- Updated citation formatting to comply with newer standards.
 
 # scoringutils 1.1.3
 
@@ -54,7 +100,7 @@ check everything is in order.
 ## Feature updates
 
 - Added a new function, `transform_forecasts()` to make it easy to transform forecasts before scoring them, as suggested in Bosse et al. (2023), https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1.
-- Added a function, `log_shift()` that implements the default transformation function. The function allows add an offset before applying the logarithm.
+- Added a function, `log_shift()` that implements the default transformation function. The function allows to add an offset before applying the logarithm.
 
 # scoringutils 1.1.1
 

diff --git a/R/check_forecasts.R b/R/check_forecasts.R
@@ -162,11 +162,12 @@ check_forecasts <- function(data) {
   if (length(n) > 1) {
     warnings <- c(
       warnings,
-      paste0(
-        "Some forecasts have different numbers of rows (e.g. quantiles or samples). ", # nolint
-        "scoringutils found: ", toString(n),
-        ". This is not necessarily a problem, but make sure this is intended."
-      )
+      "Some forecasts have different numbers of rows ",
+      "(e.g. quantiles or samples). ",
+      "scoringutils found: ", toString(n),
+      ". This may be a problem (it can potentially distort scores, ",
+      "making it more difficult to compare them), ",
+      "so make sure this is intended."
     )
   }
   data[, InternalNumCheck := NULL]

diff --git a/R/metrics_point_forecasts.R b/R/metrics_point_forecasts.R
@@ -24,17 +24,15 @@
 #' @keywords metric
 
 ae_median_sample <- function(true_values, predictions) {
-  median_predictions <- apply(as.matrix(predictions),
-                              MARGIN = 1, # rowwise
-                              FUN = median
+  median_predictions <- apply(
+    as.matrix(predictions), MARGIN = 1, FUN = median # this is rowwise
   )
 
   ae_median <- abs(true_values - median_predictions)
 
   return(ae_median)
 }
 
-
 #' @title Squared Error of the Mean (Sample-based Version)
 #'
 #' @description

diff --git a/R/pairwise-comparisons.R b/R/pairwise-comparisons.R
@@ -62,7 +62,7 @@
 #'   facet_wrap(~target_type)
 
 pairwise_comparison <- function(scores,
-                               by = "model",
+                                by = "model",
                                 metric = "auto",
                                 baseline = NULL,
                                 ...) {
@@ -93,12 +93,12 @@ pairwise_comparison <- function(scores,
   }
 
   # check that all values of the chosen metric are positive
-  if (any(sign(scores[[metric]]) < 0)) {
-    if (any(sign(scores) > 0)) {
-      msg <- paste("To compute pairwise comparisons, all values of", metric,
-                   "must have the same sign.")
-      stop(msg)
-    }
+  if (any(sign(scores[[metric]]) < 0) && any(sign(scores) > 0)) {
+    msg <- paste(
+      "To compute pairwise comparisons, all values of", metric,
+      "must have the same sign."
+    )
+    stop(msg)
   }
 
   # identify unit of single observation.

diff --git a/R/plot.R b/R/plot.R
@@ -100,8 +100,7 @@ plot_score_table <- function(scores,
   # users can then pass in a factor and keep the ordering of that column intact
   if (length(y) > 1) {
     df[, identifCol := do.call(paste, c(.SD, sep = "_")),
-       .SDcols = y[y %in% names(df)]
-    ]
+       .SDcols = y[y %in% names(df)]]
   } else {
     setnames(df, old = eval(y), new = "identifCol")
   }
@@ -757,10 +756,9 @@ plot_pairwise_comparison <- function(comparison_result,
     )]
 
     high_col <- "palegreen3"
-      comparison_result[, var_of_interest := as.character(var_of_interest)]
-      comparison_result[, var_of_interest := ifelse(var_of_interest == "0",
-                                                    "< 0.001", var_of_interest
-      )]
+    comparison_result[, var_of_interest := as.character(var_of_interest)]
+    comparison_result[, var_of_interest := ifelse(var_of_interest == "0",
+                                                  "< 0.001", var_of_interest)]
   }
 
   plot <- ggplot(
@@ -776,8 +774,7 @@ plot_pairwise_comparison <- function(comparison_result,
       width = 0.97, height = 0.97
     ) +
     geom_text(aes(label = var_of_interest),
-              na.rm = TRUE
-    ) +
+              na.rm = TRUE) +
     scale_fill_gradient2(
       low = "steelblue", mid = "grey95",
       high = high_col,

diff --git a/R/score.R b/R/score.R
@@ -82,6 +82,15 @@
 #'   add_coverage(by = c("model", "target_type")) %>%
 #'   summarise_scores(by = c("model", "target_type"))
 #'
+#' # set forecast unit manually (to avoid issues with scoringutils trying to
+#' # determine the forecast unit automatically), check forecasts before scoring
+#' example_quantile %>%
+#'   set_forecast_unit(
+#'     c("location", "target_end_date", "target_type", "horizon", "model")
+#'   ) %>%
+#'   check_forecasts() %>%
+#'   score()
+#'
 #' # forecast formats with different metrics
 #' \dontrun{
 #' score(example_binary)
@@ -106,7 +115,11 @@ score <- function(data,
                   ...) {
 
   # preparations ---------------------------------------------------------------
-  check_data <- check_forecasts(data)
+  if (is_scoringutils_check(data)) {
+    check_data <- data
+  } else {
+    check_data <- check_forecasts(data)
+  }
 
   data <- check_data$cleaned_data
   prediction_type <- check_data$prediction_type

diff --git a/R/utils.R b/R/utils.R
@@ -300,3 +300,34 @@ get_protected_columns <- function(data) {
 
   return(protected_columns)
 }
+
+
+#' @title Check whether object has been checked with check_forecasts()
+#'
+#' @description Helper function to determine whether an object has been checked
+#' by and passed [check_forecasts()].
+#'
+#' @param data An object of class `scoringutils_check()` as produced by
+#' [check_forecasts()].
+#'
+#' @importFrom methods is
+#'
+#' @return Logical, either TRUE or FALSE
+#'
+#' @keywords internal
+
+is_scoringutils_check <- function(data) {
+
+  result <- is(data, "scoringutils_check")
+
+  if (result &&
+        any(is.null(data$cleaned_data), is.null(data$prediction_type),
+            is.null(data$forecast_unit), is.null(data$target_type))) {
+    stop("Input seems to be an output of `scoringutils_check()`, ",
+         "but at least one of the required list items ",
+         "'cleaned_data', 'prediction_type', 'forecast_unit', or
+         'target_type' is missing. Try running check_forecasts() again.")
+  }
+
+  return(result)
+}
diff --git a/R/utils_data_handling.R b/R/utils_data_handling.R
@@ -143,9 +143,10 @@ range_long_to_quantile <- function(data,
   # a point forecast. This should probably be dealt with in the future
   data <- data[!(range == 0 & boundary == "upper"), ]
 
-  data[, quantile := ifelse(boundary == "lower",
-                            round((100 - range) / 200, 10),
-                            round((1 - (100 - range) / 200), 10)
+  data[, quantile := ifelse(
+    boundary == "lower",
+    round((100 - range) / 200, 10),
+    round((1 - (100 - range) / 200), 10)
   )]
 
   if (!keep_range_col) {
@@ -176,9 +177,10 @@ quantile_to_range_long <- function(data,
   data <- data.table::as.data.table(data)
 
   data[, boundary := ifelse(quantile <= 0.5, "lower", "upper")]
-  data[, range := ifelse(boundary == "lower",
-                         round((1 - 2 * quantile) * 100, 10),
-                         round((2 * quantile - 1) * 100, 10)
+  data[, range := ifelse(
+    boundary == "lower",
+    round((1 - 2 * quantile) * 100, 10),
+    round((2 * quantile - 1) * 100, 10)
   )]
 
   # add median quantile
@@ -232,14 +234,13 @@ sample_to_range_long <- function(data,
   upper_quantiles <- 1 - lower_quantiles
   quantiles <- sort(unique(c(lower_quantiles, upper_quantiles)))
 
-  data <- sample_to_quantile(data,
-                             quantiles = quantiles,
-                             type = type
+  data <- sample_to_quantile(
+    data,
+    quantiles = quantiles,
+    type = type
   )
 
-  data <- quantile_to_range_long(data,
-                                 keep_quantile_col = keep_quantile_col
-  )
+  data <- quantile_to_range_long(data, keep_quantile_col = keep_quantile_col)
 
   return(data[])
 }
diff --git a/README.Rmd b/README.Rmd
@@ -30,9 +30,9 @@ library(knitr)
 
 The `scoringutils` package provides a collection of metrics and proper scoring rules and aims to make it simple to score probabilistic forecasts against the true observed values. 
 
-You can find additional information and examples in the paper [Evaluating Forecasts with scoringutils in R](https://arxiv.org/abs/2205.07090) as well as the Vignettes ([Getting started](https://epiforecasts.io/scoringutils/articles/getting-started.html), [Details on the metrics implemented](https://epiforecasts.io/scoringutils/articles/metric-details.html) and [Scoring forecasts directly](https://epiforecasts.io/scoringutils/articles/scoring-forecasts-directly.html)).
+You can find additional information and examples in the papers [Evaluating Forecasts with scoringutils in R](https://arxiv.org/abs/2205.07090) [Scoring epidemiological forecasts on transformed scales](https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1) as well as the Vignettes ([Getting started](https://epiforecasts.io/scoringutils/articles/getting-started.html), [Details on the metrics implemented](https://epiforecasts.io/scoringutils/articles/metric-details.html) and [Scoring forecasts directly](https://epiforecasts.io/scoringutils/articles/scoring-forecasts-directly.html)).
 
-The `scoringutils` package offers convenient automated forecast evaluation in a `data.table` format (using the function `score()`), but also provides experienced users with a set of reliable lower-level scoring metrics operating on vectors/matrices they can build upon in other applications. In addition it implements a wide range of flexible plots designed to cover many use cases.
+The `scoringutils` package offers convenient automated forecast evaluation through the function `score()`. The function operates on data.frames (it uses `data.table` internally for speed and efficiency) and can easily be integrated in a workflow based on `dplyr` or `data.table`. It also provides experienced users with a set of reliable lower-level scoring metrics operating on vectors/matrices they can build upon in other applications. In addition it implements a wide range of flexible plots designed to cover many use cases.
 
 Where available `scoringutils` depends on functionality from `scoringRules` which provides a comprehensive collection of proper scoring rules for predictive probability distributions represented as sample or parametric distributions. For some forecast types, such as quantile forecasts, `scoringutils` also implements additional metrics for evaluating forecasts. On top of providing an interface to the proper scoring rules implemented in `scoringRules` and natively, `scoringutils` also offers utilities for summarising and visualising forecasts and scores, and to obtain relative scores between models which may be useful for non-overlapping forecasts and forecasts across scales. 
 
@@ -85,10 +85,12 @@ example_quantile %>%
 
 ### Scoring forecasts
 
-Forecasts can be easily and quickly scored using the `score()` function. This function returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from `scoringutils` to add empirical coverage-levels (`add_coverage()`), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.
+Forecasts can be easily and quickly scored using the `score()` function. `score()` automatically tries to determine the `forecast_unit`, i.e. the set of columns that uniquely defines a single forecast, by taking all column names of the data into account. However, it is recommended to set the forecast unit manually using `set_forecast_unit()` as this may help to avoid errors, especially when scoringutils is used in automated pipelines. The function `set_forecast_unit()` will simply drop unneeded columns. To verify everything is in order, the function `check_forecasts()` should be used. The result of that check can then passed directly into `score()`. `score()` returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from `scoringutils` to add empirical coverage-levels (`add_coverage()`), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.
 
 ```{r score-example}
 example_quantile %>%
+  set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
+  check_forecasts() %>%
   score() %>%
   add_coverage(ranges = c(50, 90), by = c("model", "target_type")) %>%
   summarise_scores(
@@ -103,7 +105,18 @@ example_quantile %>%
   kable()
 ```
 
-`scoringutils` contains additional functionality to summarise these scores at different levels, to visualise them, and to explore the forecasts themselves. See the package vignettes and function documentation for more information.
+`scoringutils` contains additional functionality to transform forecasts, to summarise scores at different levels, to visualise them, and to explore the forecasts themselves. See the package vignettes and function documentation for more information.
+
+You may want to score forecasts based on transformations of the original data in order to obtain a more complete evaluation (see [this paper](https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1) for more information). This can be done using the function `transform_forecasts()`. In the following example, we truncate values at 0 and use the function `log_shift()` to add 1 to all values before applying the natural logarithm. 
+
+```{r}
+example_quantile %>%
+ .[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
+  transform_forecasts(append = TRUE, fun = log_shift, offset = 1) %>%
+  score %>%
+  summarise_scores(by = c("model", "target_type", "scale")) %>%
+  head()
+```
 
 ## Citation