diff --git a/CHANGELOG.md b/CHANGELOG.md index aa9978c8..a0e2aa1c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,7 +29,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `aggregate_temporal_period`: Clarified which dimension labels are present in the returned data cube. [#274](https://github.com/Open-EO/openeo-processes/issues/274) - `ard_surface_reflectance`: The process has been categorized as "optical" instead of "sar". - `save_result`: Clarify how the process works in the different contexts it is used in (e.g. synchronous processing, secondary web service). [#288](https://github.com/Open-EO/openeo-processes/issues/288) -- `quantiles`: Clarified behavior. [#278](https://github.com/Open-EO/openeo-processes/issues/278) +- `quantiles`: + - The default algorithm for sample quantiles has been clarified (type 7). [#296](https://github.com/Open-EO/openeo-processes/issues/296) + - Improved documentation in general. [#278](https://github.com/Open-EO/openeo-processes/issues/278) ## [1.1.0] - 2021-06-29 diff --git a/meta/implementation.md b/meta/implementation.md index af8ac782..f69ec2be 100644 --- a/meta/implementation.md +++ b/meta/implementation.md @@ -141,3 +141,17 @@ To make `date_shift` easier to implement, we have found some libraries that foll - JavaScript: [Moment.js](https://momentjs.com/) - Python: [dateutil](https://dateutil.readthedocs.io/en/stable/index.html) - R: [lubridate](https://lubridate.tidyverse.org/) ([Cheatsheet](https://rawgit.com/rstudio/cheatsheets/master/lubridate.pdf)) + +## Quantile algorithms + +The `quantiles` could implement a number of different algorithms, literature usually distinguishes [9 types](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample). +Right now it's not possible to choose from them, but it may be added in the future. +To improve interoperability openEO processes, version 1.2.0 added details about the algorithm that must be implemented. +A survey has shown that most libraries implement type 7 and as such this was chosen to be the default. + +We have found some libraries that can be used for an implementation: +- Java: [Apache Commons Math Percentile](http://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/descriptive/rank/Percentile.html), choose the [estimation type `R_7`](http://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html#R_7) +- JavaScript: [d3](https://github.com/d3/d3-array/blob/v2.8.0/README.md#quantile), has only type 7 implemented. +- Julia: [Statistics.quantile](https://docs.julialang.org/en/v1/stdlib/Statistics/#Statistics.quantile!), type 7 is the default. +- Python: [numpy](https://numpy.org/doc/stable/reference/generated/numpy.quantile.html), [pandas](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html), [xarray](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.quantile.html) - type 7 (called 'linear' for the interpolation parameter) is the default for all of them. +- R: [quantile](https://stat.ethz.ch/R-manual/R-patched/library/stats/html/quantile.html) - type 7 is the default. diff --git a/quantiles.json b/quantiles.json index 8df00cbd..81f60c2b 100644 --- a/quantiles.json +++ b/quantiles.json @@ -1,7 +1,7 @@ { "id": "quantiles", "summary": "Quantiles", - "description": "Calculates quantiles, which are cut points dividing the range of a sample distribution into either\n\n* intervals corresponding to the given `probabilities` or\n* equal-sized intervals (q-quantiles based on the parameter `q`).\n\nEither the parameter `probabilities` or `q` must be specified, otherwise the `QuantilesParameterMissing` exception is thrown. If both parameters are set the `QuantilesParameterConflict` exception is thrown.", + "description": "Calculates quantiles, which are cut points dividing the range of a sample distribution into either\n\n* intervals corresponding to the given `probabilities` or\n* equal-sized intervals (q-quantiles based on the parameter `q`).\n\nEither the parameter `probabilities` or `q` must be specified, otherwise the `QuantilesParameterMissing` exception is thrown. If both parameters are set the `QuantilesParameterConflict` exception is thrown.\n\nSample quantiles can be computed with several different algorithms. Hyndman and Fan (1996) have concluded on nine different types, which are commonly implemented in statistical software packages. This process is implementing type 7, which is implemented widely and often also the default type (e.g. in Excel, Julia, Python, R and S).", "categories": [ "math > statistics" ], @@ -173,6 +173,12 @@ "rel": "about", "href": "https://en.wikipedia.org/wiki/Quantile", "title": "Quantiles explained by Wikipedia" + }, + { + "rel": "about", + "href": "https://www.amherst.edu/media/view/129116/original/Sample+Quantiles.pdf", + "type": "application/pdf", + "title": "Hyndman and Fan (1996): Sample Quantiles in Statistical Packages" } ] } \ No newline at end of file