Skip to content

Commit

Permalink
Merge pull request #299 from epiforecasts/make_check_forecasts_pipeable
Browse files Browse the repository at this point in the history
scoringutils 1.2.0 release
  • Loading branch information
nikosbosse authored Jul 26, 2023
2 parents 5f8f12a + 55a15df commit f74e08f
Show file tree
Hide file tree
Showing 17 changed files with 353 additions and 80 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: scoringutils
Title: Utilities for Scoring and Assessing Predictions
Version: 1.1.7
Version: 1.2.0
Language: en-GB
Authors@R: c(
person(given = "Nikos",
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ importFrom(ggplot2,xlab)
importFrom(ggplot2,ylab)
importFrom(lifecycle,deprecated)
importFrom(methods,hasArg)
importFrom(methods,is)
importFrom(rlang,enexprs)
importFrom(rlang,warn)
importFrom(scoringRules,crps_sample)
Expand Down
60 changes: 53 additions & 7 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,49 @@
# scoringutils 1.2.0

This major release contains a range of new features and bug fixes that have been introduced in minor releases since `1.1.0`. The most important changes are:

- Documentation updated to reflect changes since version 1.1.0, including new transform and workflow functions.
- New `set_forecast_unit()` function allows manual setting of forecast unit.
- `summarise_scores()` gains new `across` argument for summarizing across variables.
- New `transform_forecasts()` and `log_shift()` functions allow forecast transformations. See the documentation for `transform_forecasts()` for more details and an example use case.
- Input checks and test coverage improved for bias functions.
- Bug fix in `get_prediction_type()` for integer matrix input.
- Links to scoringutils paper and citation updates.
- Warning added in `interval_score()` for small interval ranges.
- Linting updates and improvements.

Thanks to @nikosbosse, @seabbs, and @sbfnk for code and review contributions. Thanks to @bisaloo for the suggestion to use a linting GitHub Action that only triggers on changes, and @adrian-lison for the suggestion to add a warning to `interval_score()` if the interval range is between 0 and 1.

## Package updates

- The documentation was updated to reflect the recent changes since `scoringutils 1.1.0`. In particular, usage of the functions `set_forecast_unit()`, `check_forecasts()` and `transform_forecasts()` are now documented in the Vignettes. The introduction of these functions enhances the overall workflow and help to make the code more readable. All functions are designed to be used together with the pipe operator. For example, one can now use something like the following:

```r
example_quantile |>
set_forecast_unit(c("model", "location", "forecast_date", "horizon", "target_type")) |>
check_forecasts() |>
score()
```

Documentation for the `transform_forecasts()` has also been extended. This functions allows the user to easily add transformations of forecasts, as suggested in the paper ["Scoring epidemiological forecasts on transformed scales"](https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1). In an epidemiological context, for example, it may make sense to apply the natural logarithm first before scoring forecasts, in order to obtain scores that reflect how well models are able to predict exponential growth rates, rather than absolute values. Users can now do something like the following to score a transformed version of the data in addition to the original one:

```r
data <- example_quantile[true_value > 0, ]
data |>
transform_forecasts(fun = log_shift, offset = 1) |>
score() |>
summarise_scores(by = c("model", "scale"))
```

Here we use the `log_shift()` function to apply a logarithmic transformation to the forecasts. This function was introduced in `scoringutils 1.1.2` as a helper function that acts just like `log()`, but has an additional argument `offset` that can add a number to every prediction and observed value before applying the log transformation.

## Feature updates

- Made `check_forecasts()` and `score()` pipeable (see issue #290). This means that
users can now directly use the output of `check_forecasts()` as input for
`score()`. As `score()` otherwise runs `check_forecasts()` internally anyway
this simply makes the step explicit and helps writing clearer code.

# scoringutils 1.1.7

Release by @seabbs in #305. Reviewed by @nikosbosse and @sbfnk.
Expand All @@ -21,21 +67,21 @@ Release by @seabbs in #305. Reviewed by @nikosbosse and @sbfnk.
# scoringutils 1.1.6

## Feature updates

- Added a new argument, `across`, to `summarise_scores()`. This argument allows the user to summarise scores across different forecast units as an alternative to specifying `by`. See the documentation for `summarise_scores()` for more details and an example use case.

# scoringutils 1.1.5

## Feature updates
- Added a new function, `set_forecast_unit()` that allows the user to set the forecast unit manually. The function removes all columns that are not relevant for uniquely identifying a single forecast. If not done manually, `scoringutils` attempts to determine the unit of a single automatically by simply assuming that all column names are
relevant to determine the forecast unit. This can lead to unexpected behaviour, so setting the forecast unit explicitly can help make the code easier to debug and easier to read (see issue #268).
When used as part of a workflow, `set_forecast_unit()` can be directly piped into `check_forecasts()` to
check everything is in order.

- Added a new function, `set_forecast_unit()` that allows the user to set the forecast unit manually. The function removes all columns that are not relevant for uniquely identifying a single forecast. If not done manually, `scoringutils` attempts to determine the unit of a single automatically by simply assuming that all column names are relevant to determine the forecast unit. This can lead to unexpected behaviour, so setting the forecast unit explicitly can help make the code easier to debug and easier to read (see issue #268). When used as part of a workflow, `set_forecast_unit()` can be directly piped into `check_forecasts()` to check everything is in order.

# scoringutils 1.1.4

## Package updates
- added links to the scoringutils paper [Evaluating Forecasts with scoringutils in R](https://arxiv.org/abs/2205.07090) to the package.
- updated citation formatting to comply with newer standards.

- Added links to the scoringutils paper [Evaluating Forecasts with scoringutils in R](https://arxiv.org/abs/2205.07090) to the package.
- Updated citation formatting to comply with newer standards.

# scoringutils 1.1.3

Expand All @@ -54,7 +100,7 @@ check everything is in order.
## Feature updates

- Added a new function, `transform_forecasts()` to make it easy to transform forecasts before scoring them, as suggested in Bosse et al. (2023), https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1.
- Added a function, `log_shift()` that implements the default transformation function. The function allows add an offset before applying the logarithm.
- Added a function, `log_shift()` that implements the default transformation function. The function allows to add an offset before applying the logarithm.

# scoringutils 1.1.1

Expand Down
11 changes: 6 additions & 5 deletions R/check_forecasts.R
Original file line number Diff line number Diff line change
Expand Up @@ -162,11 +162,12 @@ check_forecasts <- function(data) {
if (length(n) > 1) {
warnings <- c(
warnings,
paste0(
"Some forecasts have different numbers of rows (e.g. quantiles or samples). ", # nolint
"scoringutils found: ", toString(n),
". This is not necessarily a problem, but make sure this is intended."
)
"Some forecasts have different numbers of rows ",
"(e.g. quantiles or samples). ",
"scoringutils found: ", toString(n),
". This may be a problem (it can potentially distort scores, ",
"making it more difficult to compare them), ",
"so make sure this is intended."
)
}
data[, InternalNumCheck := NULL]
Expand Down
6 changes: 2 additions & 4 deletions R/metrics_point_forecasts.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,15 @@
#' @keywords metric

ae_median_sample <- function(true_values, predictions) {
median_predictions <- apply(as.matrix(predictions),
MARGIN = 1, # rowwise
FUN = median
median_predictions <- apply(
as.matrix(predictions), MARGIN = 1, FUN = median # this is rowwise
)

ae_median <- abs(true_values - median_predictions)

return(ae_median)
}


#' @title Squared Error of the Mean (Sample-based Version)
#'
#' @description
Expand Down
14 changes: 7 additions & 7 deletions R/pairwise-comparisons.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
#' facet_wrap(~target_type)

pairwise_comparison <- function(scores,
by = "model",
by = "model",
metric = "auto",
baseline = NULL,
...) {
Expand Down Expand Up @@ -93,12 +93,12 @@ pairwise_comparison <- function(scores,
}

# check that all values of the chosen metric are positive
if (any(sign(scores[[metric]]) < 0)) {
if (any(sign(scores) > 0)) {
msg <- paste("To compute pairwise comparisons, all values of", metric,
"must have the same sign.")
stop(msg)
}
if (any(sign(scores[[metric]]) < 0) && any(sign(scores) > 0)) {
msg <- paste(
"To compute pairwise comparisons, all values of", metric,
"must have the same sign."
)
stop(msg)
}

# identify unit of single observation.
Expand Down
13 changes: 5 additions & 8 deletions R/plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,7 @@ plot_score_table <- function(scores,
# users can then pass in a factor and keep the ordering of that column intact
if (length(y) > 1) {
df[, identifCol := do.call(paste, c(.SD, sep = "_")),
.SDcols = y[y %in% names(df)]
]
.SDcols = y[y %in% names(df)]]
} else {
setnames(df, old = eval(y), new = "identifCol")
}
Expand Down Expand Up @@ -757,10 +756,9 @@ plot_pairwise_comparison <- function(comparison_result,
)]

high_col <- "palegreen3"
comparison_result[, var_of_interest := as.character(var_of_interest)]
comparison_result[, var_of_interest := ifelse(var_of_interest == "0",
"< 0.001", var_of_interest
)]
comparison_result[, var_of_interest := as.character(var_of_interest)]
comparison_result[, var_of_interest := ifelse(var_of_interest == "0",
"< 0.001", var_of_interest)]
}

plot <- ggplot(
Expand All @@ -776,8 +774,7 @@ plot_pairwise_comparison <- function(comparison_result,
width = 0.97, height = 0.97
) +
geom_text(aes(label = var_of_interest),
na.rm = TRUE
) +
na.rm = TRUE) +
scale_fill_gradient2(
low = "steelblue", mid = "grey95",
high = high_col,
Expand Down
15 changes: 14 additions & 1 deletion R/score.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,15 @@
#' add_coverage(by = c("model", "target_type")) %>%
#' summarise_scores(by = c("model", "target_type"))
#'
#' # set forecast unit manually (to avoid issues with scoringutils trying to
#' # determine the forecast unit automatically), check forecasts before scoring
#' example_quantile %>%
#' set_forecast_unit(
#' c("location", "target_end_date", "target_type", "horizon", "model")
#' ) %>%
#' check_forecasts() %>%
#' score()
#'
#' # forecast formats with different metrics
#' \dontrun{
#' score(example_binary)
Expand All @@ -106,7 +115,11 @@ score <- function(data,
...) {

# preparations ---------------------------------------------------------------
check_data <- check_forecasts(data)
if (is_scoringutils_check(data)) {
check_data <- data
} else {
check_data <- check_forecasts(data)
}

data <- check_data$cleaned_data
prediction_type <- check_data$prediction_type
Expand Down
31 changes: 31 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -300,3 +300,34 @@ get_protected_columns <- function(data) {

return(protected_columns)
}


#' @title Check whether object has been checked with check_forecasts()
#'
#' @description Helper function to determine whether an object has been checked
#' by and passed [check_forecasts()].
#'
#' @param data An object of class `scoringutils_check()` as produced by
#' [check_forecasts()].
#'
#' @importFrom methods is
#'
#' @return Logical, either TRUE or FALSE
#'
#' @keywords internal

is_scoringutils_check <- function(data) {

result <- is(data, "scoringutils_check")

if (result &&
any(is.null(data$cleaned_data), is.null(data$prediction_type),
is.null(data$forecast_unit), is.null(data$target_type))) {
stop("Input seems to be an output of `scoringutils_check()`, ",
"but at least one of the required list items ",
"'cleaned_data', 'prediction_type', 'forecast_unit', or
'target_type' is missing. Try running check_forecasts() again.")
}

return(result)
}
25 changes: 13 additions & 12 deletions R/utils_data_handling.R
Original file line number Diff line number Diff line change
Expand Up @@ -143,9 +143,10 @@ range_long_to_quantile <- function(data,
# a point forecast. This should probably be dealt with in the future
data <- data[!(range == 0 & boundary == "upper"), ]

data[, quantile := ifelse(boundary == "lower",
round((100 - range) / 200, 10),
round((1 - (100 - range) / 200), 10)
data[, quantile := ifelse(
boundary == "lower",
round((100 - range) / 200, 10),
round((1 - (100 - range) / 200), 10)
)]

if (!keep_range_col) {
Expand Down Expand Up @@ -176,9 +177,10 @@ quantile_to_range_long <- function(data,
data <- data.table::as.data.table(data)

data[, boundary := ifelse(quantile <= 0.5, "lower", "upper")]
data[, range := ifelse(boundary == "lower",
round((1 - 2 * quantile) * 100, 10),
round((2 * quantile - 1) * 100, 10)
data[, range := ifelse(
boundary == "lower",
round((1 - 2 * quantile) * 100, 10),
round((2 * quantile - 1) * 100, 10)
)]

# add median quantile
Expand Down Expand Up @@ -232,14 +234,13 @@ sample_to_range_long <- function(data,
upper_quantiles <- 1 - lower_quantiles
quantiles <- sort(unique(c(lower_quantiles, upper_quantiles)))

data <- sample_to_quantile(data,
quantiles = quantiles,
type = type
data <- sample_to_quantile(
data,
quantiles = quantiles,
type = type
)

data <- quantile_to_range_long(data,
keep_quantile_col = keep_quantile_col
)
data <- quantile_to_range_long(data, keep_quantile_col = keep_quantile_col)

return(data[])
}
21 changes: 17 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ library(knitr)

The `scoringutils` package provides a collection of metrics and proper scoring rules and aims to make it simple to score probabilistic forecasts against the true observed values.

You can find additional information and examples in the paper [Evaluating Forecasts with scoringutils in R](https://arxiv.org/abs/2205.07090) as well as the Vignettes ([Getting started](https://epiforecasts.io/scoringutils/articles/getting-started.html), [Details on the metrics implemented](https://epiforecasts.io/scoringutils/articles/metric-details.html) and [Scoring forecasts directly](https://epiforecasts.io/scoringutils/articles/scoring-forecasts-directly.html)).
You can find additional information and examples in the papers [Evaluating Forecasts with scoringutils in R](https://arxiv.org/abs/2205.07090) [Scoring epidemiological forecasts on transformed scales](https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1) as well as the Vignettes ([Getting started](https://epiforecasts.io/scoringutils/articles/getting-started.html), [Details on the metrics implemented](https://epiforecasts.io/scoringutils/articles/metric-details.html) and [Scoring forecasts directly](https://epiforecasts.io/scoringutils/articles/scoring-forecasts-directly.html)).

The `scoringutils` package offers convenient automated forecast evaluation in a `data.table` format (using the function `score()`), but also provides experienced users with a set of reliable lower-level scoring metrics operating on vectors/matrices they can build upon in other applications. In addition it implements a wide range of flexible plots designed to cover many use cases.
The `scoringutils` package offers convenient automated forecast evaluation through the function `score()`. The function operates on data.frames (it uses `data.table` internally for speed and efficiency) and can easily be integrated in a workflow based on `dplyr` or `data.table`. It also provides experienced users with a set of reliable lower-level scoring metrics operating on vectors/matrices they can build upon in other applications. In addition it implements a wide range of flexible plots designed to cover many use cases.

Where available `scoringutils` depends on functionality from `scoringRules` which provides a comprehensive collection of proper scoring rules for predictive probability distributions represented as sample or parametric distributions. For some forecast types, such as quantile forecasts, `scoringutils` also implements additional metrics for evaluating forecasts. On top of providing an interface to the proper scoring rules implemented in `scoringRules` and natively, `scoringutils` also offers utilities for summarising and visualising forecasts and scores, and to obtain relative scores between models which may be useful for non-overlapping forecasts and forecasts across scales.

Expand Down Expand Up @@ -85,10 +85,12 @@ example_quantile %>%

### Scoring forecasts

Forecasts can be easily and quickly scored using the `score()` function. This function returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from `scoringutils` to add empirical coverage-levels (`add_coverage()`), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.
Forecasts can be easily and quickly scored using the `score()` function. `score()` automatically tries to determine the `forecast_unit`, i.e. the set of columns that uniquely defines a single forecast, by taking all column names of the data into account. However, it is recommended to set the forecast unit manually using `set_forecast_unit()` as this may help to avoid errors, especially when scoringutils is used in automated pipelines. The function `set_forecast_unit()` will simply drop unneeded columns. To verify everything is in order, the function `check_forecasts()` should be used. The result of that check can then passed directly into `score()`. `score()` returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from `scoringutils` to add empirical coverage-levels (`add_coverage()`), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.

```{r score-example}
example_quantile %>%
set_forecast_unit(c("location", "target_end_date", "target_type", "horizon", "model")) %>%
check_forecasts() %>%
score() %>%
add_coverage(ranges = c(50, 90), by = c("model", "target_type")) %>%
summarise_scores(
Expand All @@ -103,7 +105,18 @@ example_quantile %>%
kable()
```

`scoringutils` contains additional functionality to summarise these scores at different levels, to visualise them, and to explore the forecasts themselves. See the package vignettes and function documentation for more information.
`scoringutils` contains additional functionality to transform forecasts, to summarise scores at different levels, to visualise them, and to explore the forecasts themselves. See the package vignettes and function documentation for more information.

You may want to score forecasts based on transformations of the original data in order to obtain a more complete evaluation (see [this paper](https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1) for more information). This can be done using the function `transform_forecasts()`. In the following example, we truncate values at 0 and use the function `log_shift()` to add 1 to all values before applying the natural logarithm.

```{r}
example_quantile %>%
.[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
transform_forecasts(append = TRUE, fun = log_shift, offset = 1) %>%
score %>%
summarise_scores(by = c("model", "target_type", "scale")) %>%
head()
```

## Citation

Expand Down
Loading

0 comments on commit f74e08f

Please sign in to comment.