Skip to content

Commit

Permalink
extend vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
pweigmann committed Aug 28, 2024
1 parent 0be8f11 commit c2d98b1
Show file tree
Hide file tree
Showing 10 changed files with 143 additions and 37 deletions.
2 changes: 1 addition & 1 deletion .buildlibrary
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ValidationKey: '638656'
ValidationKey: '658779'
AutocreateReadme: yes
AcceptedWarnings:
- 'Warning: package ''.*'' was built under R version'
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ cff-version: 1.2.0
message: If you use this software, please cite it using the metadata from this file.
type: software
title: 'piamValidation: Validation Tools for PIK-PIAM'
version: 0.3.2
date-released: '2024-08-23'
version: 0.3.3
date-released: '2024-08-28'
abstract: The piamValidation package provides validation tools for the Potsdam Integrated
Assessment Modelling environment.
authors:
Expand Down
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Type: Package
Package: piamValidation
Title: Validation Tools for PIK-PIAM
Version: 0.3.2
Date: 2024-08-23
Version: 0.3.3
Date: 2024-08-28
Authors@R:
c(person("Pascal", "Weigmann",, "[email protected]", role = c("aut", "cre")),
person("Oliver", "Richters",, role = "aut"))
Expand Down
2 changes: 1 addition & 1 deletion R/validateScenarios.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#' format, in case of historic comparison, also path to reference data
#' @param config select config from inst/config or give a full path to a config
#' file on your computer
#' @param outputFile give name of output file in case results should be exported
#' @param outputFile give name of output file in case results should be exported;
#' include file extension
#'
#' @importFrom dplyr filter select mutate group_by %>% bind_rows
Expand Down
2 changes: 1 addition & 1 deletion R/validationHeatmap.R
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ validationHeatmap <- function(df,
coord_equal() +
theme(legend.position = "none")

# tweak for ELEVATE:
# tweak for ELEVATE: to make facet labels and title readable
if (x_facet == "scenario") {
p <- p +
theme(strip.text.x = element_text(angle = 30, vjust = 0.5, hjust=1)) +
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Validation Tools for PIK-PIAM

R package **piamValidation**, version **0.3.2**
R package **piamValidation**, version **0.3.3**

[![CRAN status](https://www.r-pkg.org/badges/version/piamValidation)](https://cran.r-project.org/package=piamValidation) [![R build status](https://github.com/pik-piam/piamValidation/workflows/check/badge.svg)](https://github.com/pik-piam/piamValidation/actions) [![codecov](https://codecov.io/gh/pik-piam/piamValidation/branch/master/graph/badge.svg)](https://app.codecov.io/gh/pik-piam/piamValidation) [![r-universe](https://pik-piam.r-universe.dev/badges/piamValidation)](https://pik-piam.r-universe.dev/builds)

Expand Down Expand Up @@ -46,7 +46,7 @@ In case of questions / problems please contact Pascal Weigmann <pascal.weigmann@

To cite package **piamValidation** in publications use:

Weigmann P, Richters O (2024). _piamValidation: Validation Tools for PIK-PIAM_. R package version 0.3.2, <https://github.com/pik-piam/piamValidation>.
Weigmann P, Richters O (2024). _piamValidation: Validation Tools for PIK-PIAM_. R package version 0.3.3, <https://github.com/pik-piam/piamValidation>.

A BibTeX entry for LaTeX users is

Expand All @@ -55,7 +55,7 @@ A BibTeX entry for LaTeX users is
title = {piamValidation: Validation Tools for PIK-PIAM},
author = {Pascal Weigmann and Oliver Richters},
year = {2024},
note = {R package version 0.3.2},
note = {R package version 0.3.3},
url = {https://github.com/pik-piam/piamValidation},
}
```
2 changes: 2 additions & 0 deletions inst/markdown/validationReport_default.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ d <- filter(df, metric == m, ref_scenario == "historical")
if (nrow(d) > 0) {
vars <- unique(d$variable)
# tagList only works for interactive plots
plot_list <- htmltools::tagList()
for (i in 1:length(vars)) {
plot_list[[i]] <- validationHeatmap(d, vars[i], met = m, historic)
Expand Down
2 changes: 1 addition & 1 deletion man/validateScenarios.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added vignettes/thresholds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
156 changes: 130 additions & 26 deletions vignettes/validateScenarios.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,10 @@ library(piamValidation)

# Overview

The function `validateScenarios()` performs validation checks on IAM scenario data based on thresholds provided in a tailored
config file. These checks either analyse the agreement with historical reference data or expectations on the scenario data.
The function `validateScenarios()` performs validation checks on IAM scenario
data based on thresholds provided in a tailored config file. These checks either
analyse the agreement with historical reference data or expectations on the
scenario data.

# Installation

Expand All @@ -32,7 +34,8 @@ The package is available via the R package repository of PIK.
install.packages("piamValidation", repos = "https://rse.pik-potsdam.de/r/packages")
```

For more detailed information please refer to the [Readme](https://github.com/pik-piam/piamValidation/tree/main?tab=readme-ov-file#installation).
For more detailed information please refer to the
[Readme](https://github.com/pik-piam/piamValidation/tree/main?tab=readme-ov-file#installation).

# Usage

Expand All @@ -44,35 +47,50 @@ function. More precisely, any data file which can be read by

*Model, Scenario, Region, Variable, Unit, \<years\>*

## Reference Data

Reference data should follow the same format guidelines as scenario data with
the exception that the ``scenario`` column needs to read ``historical``. It is
passed to the validation function as part of the ``dataPath`` argument, e.g.:

``validateScenarios(dataPath = c("<path_to_scenario_data>", "<path_to_reference_data>"),
config = ...)``


## Configuration File

The config file is the place where the validation checks are defined. Filling
the file comes with a few rules - depending on the type of check which is being
The config file is the place where the validation checks are defined. It offers
a lot of flexibility for many different types of check but writing the file also
comes with a few rules - depending on the type of check which should be
performed, different columns can or need to be filled.

![Rules for writing the config file.](./config_rules.jpg){width=100%}
![Rules for writing the config file for 6 different use cases.](./config_rules.jpg){width=100%}

**General Rules**

- historical reference data needs to read "historical" in the column "scenario"
- when using reference data, "historical" is required in the column "scenario"
- one set of thresholds per row
- empty rows and rows without "variable" are ignored and can be used to structure the config
- later rows overwrite earlier rows if thresholds overlap
- only one reference period/model/scenario per data-slice (e.g. don't compare the same data against ref_model1 and ref_model2)
- define multiple variables with "\*" (one sub-level) or "\*\*" (all sub-levels)
- ``relative`` thresholds can be given as percentage (``20%``) or decimal (``0.2``).
- if period column is empty (= all):
* historic: 2005 - 2020
* scenario: < 2100
* historical: 2005 - 2020
* all other cases: < 2100
* give range via yyyy-yyyy or comma-separated years

It is recommended to choose a historical reference source explicitly in the
"ref_model" column for historical comparisons. Otherwise, all available
historical sources will be averaged and the tooltip will show
historical sources will be averaged and the tooltip of a heatmap will show
``ref_model = "multiple"``.

<insert description of each column here?>

### Units

It is recommended to include the unit of each variable in the config file to
avoid inconsistency between data sources. The tools performs checks, whether the
avoid inconsistency between data sources. The tool performs checks, whether the
units in the config match those in scenario and reference data and returns a
warning in case they don't. This check is performed using
``piamInterfaces::areUnitsIdentical()`` to avoid false positives.
Expand All @@ -84,48 +102,131 @@ If the ``unit`` column is left empty, no consistency check will be performed.
This is the recommended approach when selecting multiple variables in one go via
``**``, which don't all share the same unit.

## Use Cases

### Use Case 1: relative comparison to reference data

You want to compare your scenario data to an external reference source, which
provides historical (or projected) data for the variable you are interested in.
The thresholds are defined as a ``relative`` deviation above or below the
reference value: $relDeviation = \frac{(scenValue - refValue)}{refValue}$.

Example:

### Use Case 1: relative comparison to historical data
| metric | critical | variable | model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
|----------|----------|--------------|-------|----------|--------|--------|---------|---------|---------|---------|-----------|--------------|------------|
| relative | yes | Emi\|CO2\|Energy | | | | | -0.25 | -0.2 | 0.2 | 0.25 | EDGAR8 | historical | |
| relative | yes | Emi\|CO2\|Energy | | | World | | -0.2 | -0.1 | 0.1 | 0.2 | EDGAR8 | historical | |
| metric | critical | variable | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
|----------|----------|--------------|-------|-------|----------|--------|--------|---------|---------|---------|---------|-----------|--------------|------------|
| relative | yes | Emi\|CO2\|Energy | Mt CO2/yr | | | | | -0.25 | -0.2 | 0.2 | 0.25 | EDGAR8 | historical | |
| relative | yes | Emi\|CO2\|Energy | Mt CO2/yr | | | World | | -0.2 | -0.1 | 0.1 | 0.2 | EDGAR8 | historical | |


### Use Case 2: difference to historical data
### Use Case 2: difference to reference data

wip
You want to compare your scenario data to an external reference source, which
provides historical (or projected) data for the variable you are interested in.

The thresholds are defined as a ``difference`` (above or below) to the
reference value: $difference = scenValue - refValue$.

Example:

| metric | critical | variable | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
|----------|----------|--------------|-------|-------|----------|--------|--------|---------|---------|---------|---------|-----------|--------------|------------|
| difference | yes | Emi\|CO2\|Energy | Mt CO2/yr | | | | | -100 | -50 | 50 | 100 | EDGAR8 | historical | |
| difference | yes | Emi\|CO2\|Energy | Mt CO2/yr | | | World | | -500 | -200 | 200 | 500 | EDGAR8 | historical | |

### Use Case 3: relative comparison to other model/scenario/period

wip
You want to compare your scenario data to itself, either by comparing periods,
scenarios or models to one another. You select one or multiple
periods/scenarios/models in the respective "period/scenario/model" column and
*exactly* one in the "ref_period/scenario/model" column.

The thresholds are defined as a ``relative`` deviation above or below the
reference value: $relDeviation = \frac{(scenValue - refValue)}{refValue}$.

Example:

| metric | critical | variable | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
|----------|----------|--------------|-------|-------|----------|--------|------|---------|---------|---------|---------|-----------|--------------|------------|
| relative | yes | Emi\|CO2\|Energy | Mt CO2/yr | REMIND | | | | -20% | -10% | 10% | 20% | MESSAGE | | |
| relative | yes | Emi\|CO2\|Energy | Mt CO2/yr | | NDC | | | | | | -25% | -10% | CurPol | |
| relative | yes | Emi\|CO2\|Energy | Mt CO2/yr | | CurPol | | 2030 | 10% | 20% | 60% | 80% | | | 2020 |
| relative | yes | Emi\|CO2\|Energy | Mt CO2/yr | | NDC | | 2030 | -20% | -10% | 20% | 40% | | | 2020 |

Currently, there is no support to compare different regions or variables to one another.

### Use Case 4: difference to other model/scenario/period

wip
You want to compare your scenario data to itself, either by comparing periods,
scenarios or models to one another. You select one or multiple
periods/scenarios/models in the respective "period/scenario/model" column and
*exactly* one in the "ref_period/scenario/model" column.

The thresholds are defined as a ``difference`` (above or below) to the
reference value: $difference = scenValue - refValue$.

Example:

| metric | critical | variable | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
|----------|----------|--------------|-------|-------|----------|--------|------|---------|---------|---------|---------|-----------|--------------|------------|
| difference | yes | Emi\|CO2\|Energy | Mt CO2/yr | REMIND | | World | 2020 | -1500 | -500 | 500 | 1500 | MESSAGE | | |
| difference | yes | Emi\|CO2\|Energy | Mt CO2/yr | | NDC | | | | | | -25% | -10% | CurPol | |
| difference | yes | Emi\|CO2\|Energy | Mt CO2/yr | | CurPol | | 2030 | 10% | 20% | 60% | 80% | | | 2020 |
| difference | yes | Emi\|CO2\|Energy | Mt CO2/yr | | NDC | | 2030 | -20% | -10% | 20% | 40% | | | 2020 |

### Use Case 5: direct comparison to absolute thresholds

wip
You want to compare your scenario data to explicit values. This could be the
case if you want to do sanity checks on variables that should never leave a
certain range or you have expert guesses on ``absolute`` upper or lower thresholds.

The tool checks whether $minRed/Yel < scenValue < maxYel/Red$.

Example:

| metric | critical | variable | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
|----------|----------|--------------|-------|-------|----------|--------|------|---------|---------|---------|---------|-----------|--------------|------------|
| absolute | yes | Share\|\*\* | \% | | | | | 0 | | | 100 | | |
| absolute | yes | Carbon Management\|Storage | Mt CO2/yr | | | | | | | | 10 000 | | |

### Use Case 6: direct comparison to yearly growthrate

wip
You want to check growth rates of variables in your scenario data. As 5-year
steps are expected, the average yearly growth rate over the last 5 years is
calculated via:

$\left(\frac{value}{value5yearsAgo}\right)^\frac{1}{5} - 1$.

Example:

| metric | critical | variable | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
|----------|----------|--------------|-------|-------|----------|--------|------|---------|---------|---------|---------|-----------|--------------|------------|
| absolute | yes | Cap\|Electricity\|Wind | GW | | | USA | | | | | 50 | | |

## Scenario Validation

The function ``validateScenarios()`` performs all necessary steps of the
validation process. It takes the config file and goes through each row,
assembling the required data and checking the thresholds.
validation process. It takes the config file and iterates through each row,
assembling the required scenario and reference data and checking the thresholds.

Optionally, you can save the resulting data.frame to a .csv file.
The output is a data.frame, which combines scenario and reference data with
the threshold that is applied to each respective data point and the result of
the validation checks.
These results can be found in the columns ``check_value`` and ``check``, with
the former containing the value that is directly compared to the thresholds (
e.g. the calculated growth rate when doing a growth rate check) and the latter
being the result in form of a traffic-light color.

![Traffic-light threshold check results.](./thresholds.png){width=80%}

Optionally, you can save the resulting data.frame to a .csv file by providing
an ``outputFile``.

```{r run validation, eval = FALSE}
df <- validateScenarios(c(scenPath, histPath), config, outputFile = NULL)
```

## Create Validation Report
## Creating a Validation Report

To perform the validation and create an output document in one go, the function
``validationReport()`` can be used. It renders an .html file which features heat
Expand All @@ -137,6 +238,9 @@ used or created according to individual needs in ``inst/markdown``.
The report is saved in a folder called ``output`` in the current working
directory.

-> Be careful when using this function on big data sets and configs with many
variables as it might create very large html files.

```{r create report, eval = FALSE}
validationReport(c(scenPath, histPath), config, report = "default")
```

0 comments on commit c2d98b1

Please sign in to comment.