Skip to content

Commit

Permalink
Quantiles: Clarify to use type 7 (#303)
Browse files Browse the repository at this point in the history
* Use type 7 for quantiles #296
  • Loading branch information
m-mohr authored Dec 1, 2021
1 parent 3faf89d commit 9ee49d5
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 2 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `aggregate_temporal_period`: Clarified which dimension labels are present in the returned data cube. [#274](https://github.com/Open-EO/openeo-processes/issues/274)
- `ard_surface_reflectance`: The process has been categorized as "optical" instead of "sar".
- `save_result`: Clarify how the process works in the different contexts it is used in (e.g. synchronous processing, secondary web service). [#288](https://github.com/Open-EO/openeo-processes/issues/288)
- `quantiles`: Clarified behavior. [#278](https://github.com/Open-EO/openeo-processes/issues/278)
- `quantiles`:
- The default algorithm for sample quantiles has been clarified (type 7). [#296](https://github.com/Open-EO/openeo-processes/issues/296)
- Improved documentation in general. [#278](https://github.com/Open-EO/openeo-processes/issues/278)

## [1.1.0] - 2021-06-29

Expand Down
14 changes: 14 additions & 0 deletions meta/implementation.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,17 @@ To make `date_shift` easier to implement, we have found some libraries that foll
- JavaScript: [Moment.js](https://momentjs.com/)
- Python: [dateutil](https://dateutil.readthedocs.io/en/stable/index.html)
- R: [lubridate](https://lubridate.tidyverse.org/) ([Cheatsheet](https://rawgit.com/rstudio/cheatsheets/master/lubridate.pdf))

## Quantile algorithms

The `quantiles` could implement a number of different algorithms, literature usually distinguishes [9 types](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample).
Right now it's not possible to choose from them, but it may be added in the future.
To improve interoperability openEO processes, version 1.2.0 added details about the algorithm that must be implemented.
A survey has shown that most libraries implement type 7 and as such this was chosen to be the default.

We have found some libraries that can be used for an implementation:
- Java: [Apache Commons Math Percentile](http://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/descriptive/rank/Percentile.html), choose the [estimation type `R_7`](http://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html#R_7)
- JavaScript: [d3](https://github.com/d3/d3-array/blob/v2.8.0/README.md#quantile), has only type 7 implemented.
- Julia: [Statistics.quantile](https://docs.julialang.org/en/v1/stdlib/Statistics/#Statistics.quantile!), type 7 is the default.
- Python: [numpy](https://numpy.org/doc/stable/reference/generated/numpy.quantile.html), [pandas](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html), [xarray](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.quantile.html) - type 7 (called 'linear' for the interpolation parameter) is the default for all of them.
- R: [quantile](https://stat.ethz.ch/R-manual/R-patched/library/stats/html/quantile.html) - type 7 is the default.
8 changes: 7 additions & 1 deletion quantiles.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"id": "quantiles",
"summary": "Quantiles",
"description": "Calculates quantiles, which are cut points dividing the range of a sample distribution into either\n\n* intervals corresponding to the given `probabilities` or\n* equal-sized intervals (q-quantiles based on the parameter `q`).\n\nEither the parameter `probabilities` or `q` must be specified, otherwise the `QuantilesParameterMissing` exception is thrown. If both parameters are set the `QuantilesParameterConflict` exception is thrown.",
"description": "Calculates quantiles, which are cut points dividing the range of a sample distribution into either\n\n* intervals corresponding to the given `probabilities` or\n* equal-sized intervals (q-quantiles based on the parameter `q`).\n\nEither the parameter `probabilities` or `q` must be specified, otherwise the `QuantilesParameterMissing` exception is thrown. If both parameters are set the `QuantilesParameterConflict` exception is thrown.\n\nSample quantiles can be computed with several different algorithms. Hyndman and Fan (1996) have concluded on nine different types, which are commonly implemented in statistical software packages. This process is implementing type 7, which is implemented widely and often also the default type (e.g. in Excel, Julia, Python, R and S).",
"categories": [
"math > statistics"
],
Expand Down Expand Up @@ -173,6 +173,12 @@
"rel": "about",
"href": "https://en.wikipedia.org/wiki/Quantile",
"title": "Quantiles explained by Wikipedia"
},
{
"rel": "about",
"href": "https://www.amherst.edu/media/view/129116/original/Sample+Quantiles.pdf",
"type": "application/pdf",
"title": "Hyndman and Fan (1996): Sample Quantiles in Statistical Packages"
}
]
}

0 comments on commit 9ee49d5

Please sign in to comment.