extend vignette

pik-piam · Aug 28, 2024 · c2d98b1 · c2d98b1
1 parent 0be8f11
commit c2d98b1
Show file tree

Hide file tree

Showing 10 changed files with 143 additions and 37 deletions.
diff --git a/.buildlibrary b/.buildlibrary
@@ -1,4 +1,4 @@
-ValidationKey: '638656'
+ValidationKey: '658779'
 AutocreateReadme: yes
 AcceptedWarnings:
 - 'Warning: package ''.*'' was built under R version'

diff --git a/CITATION.cff b/CITATION.cff
@@ -2,8 +2,8 @@ cff-version: 1.2.0
 message: If you use this software, please cite it using the metadata from this file.
 type: software
 title: 'piamValidation: Validation Tools for PIK-PIAM'
-version: 0.3.2
-date-released: '2024-08-23'
+version: 0.3.3
+date-released: '2024-08-28'
 abstract: The piamValidation package provides validation tools for the Potsdam Integrated
   Assessment Modelling environment.
 authors:

diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,8 +1,8 @@
 Type: Package
 Package: piamValidation
 Title: Validation Tools for PIK-PIAM
-Version: 0.3.2
-Date: 2024-08-23
+Version: 0.3.3
+Date: 2024-08-28
 Authors@R:
     c(person("Pascal", "Weigmann",, "[email protected]", role = c("aut", "cre")),
       person("Oliver", "Richters",, role = "aut"))

diff --git a/R/validateScenarios.R b/R/validateScenarios.R
@@ -4,7 +4,7 @@
 #'        format, in case of historic comparison, also path to reference data
 #' @param config select config from inst/config or give a full path to a config
 #'        file on your computer
-#' @param outputFile give name of output file in case results should be exported
+#' @param outputFile give name of output file in case results should be exported;
 #'        include file extension
 #'
 #' @importFrom dplyr filter select mutate group_by %>% bind_rows

diff --git a/R/validationHeatmap.R b/R/validationHeatmap.R
@@ -116,7 +116,7 @@ validationHeatmap <- function(df,
       coord_equal() +
       theme(legend.position = "none")
 
-    # tweak for ELEVATE:
+    # tweak for ELEVATE: to make facet labels and title readable
     if (x_facet == "scenario") {
       p <- p +
         theme(strip.text.x = element_text(angle = 30, vjust = 0.5, hjust=1)) +

diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Validation Tools for PIK-PIAM
 
-R package **piamValidation**, version **0.3.2**
+R package **piamValidation**, version **0.3.3**
 
 [![CRAN status](https://www.r-pkg.org/badges/version/piamValidation)](https://cran.r-project.org/package=piamValidation)  [![R build status](https://github.com/pik-piam/piamValidation/workflows/check/badge.svg)](https://github.com/pik-piam/piamValidation/actions) [![codecov](https://codecov.io/gh/pik-piam/piamValidation/branch/master/graph/badge.svg)](https://app.codecov.io/gh/pik-piam/piamValidation) [![r-universe](https://pik-piam.r-universe.dev/badges/piamValidation)](https://pik-piam.r-universe.dev/builds)
 
@@ -46,7 +46,7 @@ In case of questions / problems please contact Pascal Weigmann <pascal.weigmann@
 
 To cite package **piamValidation** in publications use:
 
-Weigmann P, Richters O (2024). _piamValidation: Validation Tools for PIK-PIAM_. R package version 0.3.2, <https://github.com/pik-piam/piamValidation>.
+Weigmann P, Richters O (2024). _piamValidation: Validation Tools for PIK-PIAM_. R package version 0.3.3, <https://github.com/pik-piam/piamValidation>.
 
 A BibTeX entry for LaTeX users is
 
@@ -55,7 +55,7 @@ A BibTeX entry for LaTeX users is
   title = {piamValidation: Validation Tools for PIK-PIAM},
   author = {Pascal Weigmann and Oliver Richters},
   year = {2024},
-  note = {R package version 0.3.2},
+  note = {R package version 0.3.3},
   url = {https://github.com/pik-piam/piamValidation},
 }
 ```
diff --git a/inst/markdown/validationReport_default.Rmd b/inst/markdown/validationReport_default.Rmd
@@ -74,6 +74,8 @@ d <- filter(df, metric == m, ref_scenario == "historical")
 
 if (nrow(d) > 0) {
   vars <- unique(d$variable)
+  
+  # tagList only works for interactive plots
   plot_list <- htmltools::tagList()
     for (i in 1:length(vars)) {
       plot_list[[i]] <- validationHeatmap(d, vars[i], met = m, historic)

diff --git a/man/validateScenarios.Rd b/man/validateScenarios.Rd
diff --git a/vignettes/thresholds.png b/vignettes/thresholds.png
diff --git a/vignettes/validateScenarios.Rmd b/vignettes/validateScenarios.Rmd
@@ -21,8 +21,10 @@ library(piamValidation)
 
 # Overview
 
-The function `validateScenarios()` performs validation checks on IAM scenario data based on thresholds provided in a tailored
-config file. These checks either analyse the agreement with historical reference data or expectations on the scenario data.
+The function `validateScenarios()` performs validation checks on IAM scenario 
+data based on thresholds provided in a tailored config file. These checks either 
+analyse the agreement with historical reference data or expectations on the 
+scenario data.
 
 # Installation
 
@@ -32,7 +34,8 @@ The package is available via the R package repository of PIK.
 install.packages("piamValidation", repos = "https://rse.pik-potsdam.de/r/packages")
 ```
 
-For more detailed information please refer to the [Readme](https://github.com/pik-piam/piamValidation/tree/main?tab=readme-ov-file#installation).
+For more detailed information please refer to the 
+[Readme](https://github.com/pik-piam/piamValidation/tree/main?tab=readme-ov-file#installation).
 
 # Usage
 
@@ -44,35 +47,50 @@ function. More precisely, any data file which can be read by
 
 *Model, Scenario, Region, Variable, Unit, \<years\>*
 
+## Reference Data
+
+Reference data should follow the same format guidelines as scenario data with 
+the exception that the ``scenario`` column needs to read ``historical``. It is 
+passed to the validation function as part of the ``dataPath`` argument, e.g.:
+
+``validateScenarios(dataPath = c("<path_to_scenario_data>", "<path_to_reference_data>"),
+config = ...)``
+
 
 ## Configuration File
 
-The config file is the place where the validation checks are defined. Filling
-the file comes with a few rules - depending on the type of check which is being 
+The config file is the place where the validation checks are defined. It offers
+a lot of flexibility for many different types of check but writing the file also 
+comes with a few rules - depending on the type of check which should be
 performed, different columns can or need to be filled.
 
-![Rules for writing the config file.](./config_rules.jpg){width=100%}
+![Rules for writing the config file for 6 different use cases.](./config_rules.jpg){width=100%}
 
 **General Rules**
 
-- historical reference data needs to read "historical" in the column "scenario"
+- when using reference data, "historical" is required in the column "scenario"
 - one set of thresholds per row
+- empty rows and rows without "variable" are ignored and can be used to structure the config
+- later rows overwrite earlier rows if thresholds overlap
 - only one reference period/model/scenario per data-slice (e.g. don't compare the same data against ref_model1 and ref_model2)
 - define multiple variables with "\*" (one sub-level) or "\*\*" (all sub-levels)
+- ``relative`` thresholds can be given as percentage (``20%``) or decimal (``0.2``).
 - if period column is empty (= all):
-  * historic: 2005 - 2020
-  * scenario: < 2100
+  * historical: 2005 - 2020
+  * all other cases: < 2100
   * give range via yyyy-yyyy or comma-separated years
 
 It is recommended to choose a historical reference source explicitly in the 
 "ref_model" column for historical comparisons. Otherwise, all available 
-historical sources will be averaged and the tooltip will show
+historical sources will be averaged and the tooltip of a heatmap will show
 ``ref_model = "multiple"``.
 
+<insert description of each column here?>
+
 ### Units
 
 It is recommended to include the unit of each variable in the config file to
-avoid inconsistency between data sources. The tools performs checks, whether the
+avoid inconsistency between data sources. The tool performs checks, whether the
 units in the config match those in scenario and reference data and returns a 
 warning in case they don't. This check is performed using 
 ``piamInterfaces::areUnitsIdentical()`` to avoid false positives.
@@ -84,48 +102,131 @@ If the ``unit`` column is left empty, no consistency check will be performed.
 This is the recommended approach when selecting multiple variables in one go via
 ``**``, which don't all share the same unit.
 
+## Use Cases
+
+### Use Case 1: relative comparison to reference data
+
+You want to compare your scenario data to an external reference source, which 
+provides historical (or projected) data for the variable you are interested in.
+The thresholds are defined as a ``relative`` deviation above or below the 
+reference value: $relDeviation = \frac{(scenValue - refValue)}{refValue}$.
+
+Example:
 
-### Use Case 1: relative comparison to historical data
-| metric   | critical | variable     | model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
-|----------|----------|--------------|-------|----------|--------|--------|---------|---------|---------|---------|-----------|--------------|------------|
-| relative | yes      | Emi\|CO2\|Energy |       |          |        |        | -0.25   | -0.2    | 0.2     | 0.25    | EDGAR8    | historical   |            |
-| relative | yes      | Emi\|CO2\|Energy |       |          | World  |        | -0.2    | -0.1    | 0.1     | 0.2     | EDGAR8    | historical   |            |
+| metric   | critical | variable     | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
+|----------|----------|--------------|-------|-------|----------|--------|--------|---------|---------|---------|---------|-----------|--------------|------------|
+| relative | yes      | Emi\|CO2\|Energy | Mt CO2/yr    |   |          |        |        | -0.25   | -0.2    | 0.2     | 0.25    | EDGAR8    | historical   |            |
+| relative | yes      | Emi\|CO2\|Energy | Mt CO2/yr    |   |          | World  |        | -0.2    | -0.1    | 0.1     | 0.2     | EDGAR8    | historical   |            |
 
 
-### Use Case 2: difference to historical data
+### Use Case 2: difference to reference data
 
-wip
+You want to compare your scenario data to an external reference source, which 
+provides historical (or projected) data for the variable you are interested in.
+
+The thresholds are defined as a ``difference`` (above or below) to the 
+reference value: $difference = scenValue - refValue$.
+
+Example:
+
+| metric   | critical | variable     | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
+|----------|----------|--------------|-------|-------|----------|--------|--------|---------|---------|---------|---------|-----------|--------------|------------|
+| difference | yes      | Emi\|CO2\|Energy | Mt CO2/yr    |   |          |        |        | -100   | -50    | 50     | 100    | EDGAR8    | historical   |            |
+| difference | yes      | Emi\|CO2\|Energy | Mt CO2/yr    |   |          | World  |        | -500    | -200    | 200     | 500     | EDGAR8    | historical   |            |
 
 ### Use Case 3: relative comparison to other model/scenario/period
 
-wip
+You want to compare your scenario data to itself, either by comparing periods,
+scenarios or models to one another. You select one or multiple 
+periods/scenarios/models in the respective "period/scenario/model" column and
+*exactly* one in the "ref_period/scenario/model" column.
+
+The thresholds are defined as a ``relative`` deviation above or below the 
+reference value: $relDeviation = \frac{(scenValue - refValue)}{refValue}$.
+
+Example:
+
+| metric   | critical | variable     | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
+|----------|----------|--------------|-------|-------|----------|--------|------|---------|---------|---------|---------|-----------|--------------|------------|
+| relative | yes      | Emi\|CO2\|Energy | Mt CO2/yr | REMIND |     |   |       |  -20%   |  -10%   |  10%    |   20%   | MESSAGE    |    |            |
+| relative | yes      | Emi\|CO2\|Energy | Mt CO2/yr |   | NDC |    |   |       |         |         |  -25%   |  -10%   |  CurPol |            |
+| relative | yes      | Emi\|CO2\|Energy | Mt CO2/yr |   | CurPol  |   | 2030   |   10%   |   20%   |   60%   |   80%   |  | |  2020          |
+| relative | yes      | Emi\|CO2\|Energy | Mt CO2/yr |   | NDC     |   | 2030   | -20%    |  -10%   |   20%   |   40%   |  | |  2020          |
+
+Currently, there is no support to compare different regions or variables to one another.
 
 ### Use Case 4: difference to other model/scenario/period
 
-wip
+You want to compare your scenario data to itself, either by comparing periods,
+scenarios or models to one another. You select one or multiple 
+periods/scenarios/models in the respective "period/scenario/model" column and
+*exactly* one in the "ref_period/scenario/model" column.
+
+The thresholds are defined as a ``difference`` (above or below) to the 
+reference value: $difference = scenValue - refValue$.
+
+Example:
+
+| metric   | critical | variable     | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
+|----------|----------|--------------|-------|-------|----------|--------|------|---------|---------|---------|---------|-----------|--------------|------------|
+| difference | yes      | Emi\|CO2\|Energy | Mt CO2/yr | REMIND |     | World | 2020  |  -1500   |  -500   |  500  |   1500  | MESSAGE    |    |            |
+| difference | yes      | Emi\|CO2\|Energy | Mt CO2/yr |   | NDC |    |   |       |         |         |  -25%   |  -10%   |  CurPol |           |
+| difference | yes      | Emi\|CO2\|Energy | Mt CO2/yr |   | CurPol  |   | 2030   |   10%   |   20%   |   60%   |   80%   |  | |  2020          |
+| difference | yes      | Emi\|CO2\|Energy | Mt CO2/yr |   | NDC     |   | 2030   | -20%    |  -10%   |   20%   |   40%   |  | |  2020          |
 
 ### Use Case 5: direct comparison to absolute thresholds
 
-wip
+You want to compare your scenario data to explicit values. This could be the
+case if you want to do sanity checks on variables that should never leave a 
+certain range or you have expert guesses on ``absolute`` upper or lower thresholds.
+
+The tool checks whether $minRed/Yel < scenValue < maxYel/Red$.
+
+Example:
+
+| metric   | critical | variable     | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
+|----------|----------|--------------|-------|-------|----------|--------|------|---------|---------|---------|---------|-----------|--------------|------------|
+| absolute | yes      | Share\|\*\* | \% |  |     |   |       |  0   |     |    |   100  |   |    |    
+| absolute | yes      | Carbon Management\|Storage | Mt CO2/yr |  |     |   |       |     |     |    |   10 000  |   |    |    
 
 ### Use Case 6: direct comparison to yearly growthrate
 
-wip
+You want to check growth rates of variables in your scenario data. As 5-year
+steps are expected, the average yearly growth rate over the last 5 years is 
+calculated via:
+
+$\left(\frac{value}{value5yearsAgo}\right)^\frac{1}{5} - 1$.
+
+Example:
 
+| metric   | critical | variable     | unit |model | scenario | region | period | min_red | min_yel | max_yel | max_red | ref_model | ref_scenario | ref_period |
+|----------|----------|--------------|-------|-------|----------|--------|------|---------|---------|---------|---------|-----------|--------------|------------|
+| absolute | yes      | Cap\|Electricity\|Wind | GW  |  |     | USA |       |     |     |    |  50  |   |    |    
 
 ## Scenario Validation
 
 The function ``validateScenarios()`` performs all necessary steps of the 
-validation process. It takes the config file and goes through each row, 
-assembling the required data and checking the thresholds.
+validation process. It takes the config file and iterates through each row, 
+assembling the required scenario and reference data and checking the thresholds.
 
-Optionally, you can save the resulting data.frame to a .csv file.
+The output is a data.frame, which combines scenario and reference data with
+the threshold that is applied to each respective data point and the result of 
+the validation checks.
+These results can be found in the columns ``check_value`` and ``check``, with 
+the former containing the value that is directly compared to the thresholds (
+e.g. the calculated growth rate when doing a growth rate check) and the latter
+being the result in form of a traffic-light color.
+
+![Traffic-light threshold check results.](./thresholds.png){width=80%}
+
+Optionally, you can save the resulting data.frame to a .csv file by providing
+an ``outputFile``.
 
 ```{r run validation, eval = FALSE}
 df <- validateScenarios(c(scenPath, histPath), config, outputFile = NULL)
 ```
 
-## Create Validation Report
+## Creating a Validation Report
 
 To perform the validation and create an output document in one go, the function
 ``validationReport()`` can be used. It renders an .html file which features heat 
@@ -137,6 +238,9 @@ used or created according to individual needs in ``inst/markdown``.
 The report is saved in a folder called ``output`` in the current working 
 directory.
 
+-> Be careful when using this function on big data sets and configs with many
+variables as it might create very large html files.
+
 ```{r create report, eval = FALSE}
 validationReport(c(scenPath, histPath), config, report = "default")
 ```