diff --git a/CHANGELOG.md b/CHANGELOG.md index fcb4bc3a..ef5fd7f3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,10 +6,50 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## Unreleased / Draft +## [1.2.0] - 2021-12-13 + +### Added + +- New processes in proposal state + - `fit_curve` + - `predict_curve` +- `ard_normalized_radar_backscatter` and `sar_backscatter`: Added `options` parameter +- `array_find`: Added parameter `reverse`. [#269](https://github.com/Open-EO/openeo-processes/issues/269) +- `load_result`: + - Added ability to load by (signed) URL (supported since openEO API v1.1.0). + - Added parameters `spatial_extent`, `temporal_extent` and `bands`. [#220](https://github.com/Open-EO/openeo-processes/issues/220) +- `run_udf`: Exception `InvalidRuntime` added. [#273](https://github.com/Open-EO/openeo-processes/issues/273) +- A new category "math > statistics" has been added [#277](https://github.com/Open-EO/openeo-processes/issues/277) + +### Changed + +- `array_labels`: Allow normal arrays to be passed for which the process returns the indices. [#243](https://github.com/Open-EO/openeo-processes/issues/243) +- `debug`: + - Renamed to `inspect`. + - The log level `error` does not need to stop execution. + - Added proposals for logging several data types to the implementation guide. + +### Removed + +- Removed the explicit schema for `raster-cube` in the `data` parameters and return values of `run_udf` and `run_udf_externally`. It's still possible to pass raster-cubes via the "any" data type, but it's discouraged due to scalability issues. [#285](https://github.com/Open-EO/openeo-processes/issues/285) + +### Fixed + +- `aggregate_temporal_period`: Clarified which dimension labels are present in the returned data cube. [#274](https://github.com/Open-EO/openeo-processes/issues/274) +- `ard_surface_reflectance`: The process has been categorized as "optical" instead of "sar". +- `array_modify`: Clarified behavior. +- `save_result`: Clarify how the process works in the different contexts it is used in (e.g. synchronous processing, secondary web service). [#288](https://github.com/Open-EO/openeo-processes/issues/288) +- `quantiles`: + - The default algorithm for sample quantiles has been clarified (type 7). [#296](https://github.com/Open-EO/openeo-processes/issues/296) + - Improved documentation in general. [#278](https://github.com/Open-EO/openeo-processes/issues/278) + ## [1.1.0] - 2021-06-29 ### Added + - New processes in proposal state + - `ard_normalized_radar_backscatter` + - `ard_surface_relectance` - `array_append` - `array_concat` - `array_create` @@ -17,6 +57,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `array_find_label` - `array_interpolate_linear` [#173](https://github.com/Open-EO/openeo-processes/issues/173) - `array_modify` + - `atmospheric_correction` + - `cloud_detection` - `date_shift` - `is_infinite` - `nan` @@ -30,7 +72,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Moved the experimental process `run_udf_externally` to the proposals. - Moved the rarely used and implemented processes `cummax`, `cummin`, `cumproduct`, `cumsum`, `debug`, `filter_labels`, `load_result`, `load_uploaded_files`, `resample_cube_temporal` to the proposals. - Exception messages have been aligned always use ` instead of '. Tooling could render it with CommonMark. -- `load_collection` and `mask_polygon`: Also support multi polygons instead of just polygons. [#237](https://github.com/Open-EO/openeo-processes/issues/237) +- `load_collection` and `mask_polygon`: Also support multi polygons instead of just polygons. [#237](https://github.com/Open-EO/openeo-processes/issues/237) - `run_udf` and `run_udf_externally`: Specify specific (extensible) protocols for UDF URIs. - `resample_cube_spatial` and `resample_spatial`: Aligned with GDAL and added `rms` and `sum` options to methods. Also added better descriptions. - `resample_cube_temporal`: Process has been simplified and only offers the nearest neighbor method now. The `process` parameter has been removed, the `dimension` parameter was made less restrictive, the parameter `valid_within` was added. [#194](https://github.com/Open-EO/openeo-processes/issues/194) @@ -221,7 +263,8 @@ First version which is separated from the openEO API. Complete rework of all pro Older versions of the processes were released as part of the openEO API, see the corresponding changelog for more information. -[Unreleased]: +[Unreleased]: +[1.2.0]: [1.1.0]: [1.0.0]: [1.0.0-rc.1]: diff --git a/README.md b/README.md index 28c6ef37..99ea0a02 100644 --- a/README.md +++ b/README.md @@ -8,12 +8,13 @@ openEO develops interoperable processes for big Earth observation cloud processi The [master branch](https://github.com/Open-EO/openeo-processes/tree/master) is the 'stable' version of the openEO processes specification. An exception is the [`proposals`](proposals/) folder, which provides experimental new processes currently under discussion. They may still change, but everyone is encouraged to implement them and give feedback. -The latest release is version **1.1.0**. The [draft branch](https://github.com/Open-EO/openeo-processes/tree/draft) is where active development takes place. PRs should be made against the draft branch. +The latest release is version **1.2.0**. The [draft branch](https://github.com/Open-EO/openeo-processes/tree/draft) is where active development takes place. PRs should be made against the draft branch. | Version / Branch | Status | openEO API versions | | ------------------------------------------------------------ | ------------------------- | ------------------- | | [unreleased / draft](https://processes.openeo.org/draft) | in development | 1.x.x | -| [**1.1.0** / master](https://processes.openeo.org/1.1.0/) | **latest stable version** | 1.x.x | +| [**1.2.0** / master](https://processes.openeo.org/1.2.0/) | **latest stable version** | 1.x.x | +| [1.1.0](https://processes.openeo.org/1.1.0/) | legacy version | 1.x.x | | [1.0.0](https://processes.openeo.org/1.0.0/) | legacy version | 1.x.x | | [1.0.0 RC1](https://processes.openeo.org/1.0.0-rc.1/) | legacy version | 1.x.x | | [0.4.2](https://processes.openeo.org/0.4.2/) | legacy version | 0.4.x | @@ -33,11 +34,7 @@ This repository contains a set of files formally describing the openEO Processes * [implementation.md](meta/implementation.md) in the `meta` folder provide some additional implementation details for back-ends. For back-end implementors, it's highly recommended to read them. * [subtype-schemas.json](meta/subtype-schemas.json) in the `meta` folder defines common data types (`subtype`s) for JSON Schema used in openEO processes. * The [`examples`](examples/) folder contains some useful examples that the processes link to. All of these are non-binding additions. -* The [`tests`](tests/) folder can be used to test the process specification for validity and consistent "style". It also allows rendering the processes in a web browser. - - If you switch to the `tests` folder in CLI and after installing NodeJS and run `npm install`, you can run a couple of commands: - * `npm test`: Check the processes for validity and lint them. Processes need to pass tests to be added to this repository. - * `npm run render`: Opens a browser with all processes rendered through the docgen. +* The [`tests`](tests/) folder can be used to test the process specification for validity and consistent "style". It also allows rendering the processes in a web browser. Check the [tests documentation](tests/README.md) for details. ## Process diff --git a/aggregate_temporal_period.json b/aggregate_temporal_period.json index b0d4c110..832e72aa 100644 --- a/aggregate_temporal_period.json +++ b/aggregate_temporal_period.json @@ -10,7 +10,7 @@ "parameters": [ { "name": "data", - "description": "A data cube.", + "description": "The source data cube.", "schema": { "type": "object", "subtype": "raster-cube" @@ -73,7 +73,7 @@ }, { "name": "dimension", - "description": "The name of the temporal dimension for aggregation. All data along the dimension is passed through the specified reducer. If the dimension is not set or set to `null`, the data cube is expected to only have one temporal dimension. Fails with a `TooManyDimensions` exception if it has more dimensions. Fails with a `DimensionNotAvailable` exception if the specified dimension does not exist.", + "description": "The name of the temporal dimension for aggregation. All data along the dimension is passed through the specified reducer. If the dimension is not set or set to `null`, the source data cube is expected to only have one temporal dimension. Fails with a `TooManyDimensions` exception if it has more dimensions. Fails with a `DimensionNotAvailable` exception if the specified dimension does not exist.", "schema": { "type": [ "string", @@ -94,7 +94,7 @@ } ], "returns": { - "description": "A new data cube with the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged, except for the resolution and dimension labels of the given temporal dimension. The specified temporal dimension has the following dimension labels (`YYYY` = four-digit year, `MM` = two-digit month, `DD` two-digit day of month):\n\n* `hour`: `YYYY-MM-DD-00` - `YYYY-MM-DD-23`\n* `day`: `YYYY-001` - `YYYY-365`\n* `week`: `YYYY-01` - `YYYY-52`\n* `dekad`: `YYYY-00` - `YYYY-36`\n* `month`: `YYYY-01` - `YYYY-12`\n* `season`: `YYYY-djf` (December - February), `YYYY-mam` (March - May), `YYYY-jja` (June - August), `YYYY-son` (September - November).\n* `tropical-season`: `YYYY-ndjfma` (November - April), `YYYY-mjjaso` (May - October).\n* `year`: `YYYY`\n* `decade`: `YYY0`\n* `decade-ad`: `YYY1`", + "description": "A new data cube with the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged, except for the resolution and dimension labels of the given temporal dimension. The specified temporal dimension has the following dimension labels (`YYYY` = four-digit year, `MM` = two-digit month, `DD` two-digit day of month):\n\n* `hour`: `YYYY-MM-DD-00` - `YYYY-MM-DD-23`\n* `day`: `YYYY-001` - `YYYY-365`\n* `week`: `YYYY-01` - `YYYY-52`\n* `dekad`: `YYYY-00` - `YYYY-36`\n* `month`: `YYYY-01` - `YYYY-12`\n* `season`: `YYYY-djf` (December - February), `YYYY-mam` (March - May), `YYYY-jja` (June - August), `YYYY-son` (September - November).\n* `tropical-season`: `YYYY-ndjfma` (November - April), `YYYY-mjjaso` (May - October).\n* `year`: `YYYY`\n* `decade`: `YYY0`\n* `decade-ad`: `YYY1`\n\nThe dimension labels in the new data cube are complete for the whole extent of the source data cube. For example, if `period` is set to `day` and the source data cube has two dimension labels at the beginning of the year (`2020-01-01`) and the end of a year (`2020-12-31`), the process returns a data cube with 365 dimension labels (`2020-001`, `2020-002`, ..., `2020-365`). In contrast, if `period` is set to `day` and the source data cube has just one dimension label `2020-01-05`, the process returns a data cube with just a single dimension label (`2020-005`).", "schema": { "type": "object", "subtype": "raster-cube" diff --git a/array_apply.json b/array_apply.json index bea8a744..15da28dc 100644 --- a/array_apply.json +++ b/array_apply.json @@ -96,13 +96,13 @@ { "rel": "example", "type": "application/json", - "href": "https://processes.openeo.org/1.1.0/examples/array_find_nodata.json", + "href": "https://processes.openeo.org/1.2.0/examples/array_find_nodata.json", "title": "Find no-data values in arrays" }, { "rel": "example", "type": "application/json", - "href": "https://processes.openeo.org/1.1.0/examples/array_contains_nodata.json", + "href": "https://processes.openeo.org/1.2.0/examples/array_contains_nodata.json", "title": "Check for no-data values in arrays" } ] diff --git a/array_contains.json b/array_contains.json index cabfcf23..745b62b3 100644 --- a/array_contains.json +++ b/array_contains.json @@ -133,7 +133,7 @@ { "rel": "example", "type": "application/json", - "href": "https://processes.openeo.org/1.1.0/examples/array_contains_nodata.json", + "href": "https://processes.openeo.org/1.2.0/examples/array_contains_nodata.json", "title": "Check for no-data values in arrays" } ], diff --git a/array_find.json b/array_find.json index d60a450d..c95f2628 100644 --- a/array_find.json +++ b/array_find.json @@ -1,7 +1,7 @@ { "id": "array_find", "summary": "Get the index for a value in an array", - "description": "Checks whether the array specified for `data` contains the value specified in `value` and returns the zero-based index for the first match. If there's no match, `null` is returned.\n\n**Remarks:**\n\n* To get a boolean value returned use ``array_contains()``.\n* All definitions for the process ``eq()`` regarding the comparison of values apply here as well. A `null` return value from ``eq()`` is handled exactly as `false` (no match).\n* Data types MUST be checked strictly. For example, a string with the content *1* is not equal to the number *1*.\n* An integer *1* is equal to a floating-point number *1.0* as `integer` is a sub-type of `number`. Still, this process may return unexpectedly `false` when comparing floating-point numbers due to floating-point inaccuracy in machine-based computation.\n* Temporal strings are treated as normal strings and MUST NOT be interpreted.\n* If the specified value is an array, object or null, the process always returns `null`. See the examples for one to find `null` values.", + "description": "Returns the zero-based index of the first (or last) occurrence of the value specified by `value` in the array specified by `data` or `null` if there is no match. Use the parameter `reverse` to switch from the first to the last match.\n\n**Remarks:**\n\n* Use ``array_contains()`` to check if an array contains a value regardless of the position.\n* Use ``array_find_label()`` to find the index for a label.\n* All definitions for the process ``eq()`` regarding the comparison of values apply here as well. A `null` return value from ``eq()`` is handled exactly as `false` (no match).\n* Data types MUST be checked strictly. For example, a string with the content *1* is not equal to the number *1*.\n* An integer *1* is equal to a floating-point number *1.0* as `integer` is a sub-type of `number`. Still, this process may return unexpectedly `false` when comparing floating-point numbers due to floating-point inaccuracy in machine-based computation.\n* Temporal strings are treated as normal strings and MUST NOT be interpreted.\n* If the specified value is an array, object or null, the process always returns `null`. See the examples for one to find `null` values.", "categories": [ "arrays", "reducer" @@ -23,6 +23,15 @@ "schema": { "description": "Any data type is allowed." } + }, + { + "name": "reverse", + "description": "By default, this process finds the index of the first match. To return the index of the last match instead, set this flag to `true`.", + "schema": { + "type": "boolean" + }, + "default": false, + "optional": true } ], "returns": { @@ -43,12 +52,28 @@ "data": [ 1, 2, + 3, + 2, 3 ], "value": 2 }, "returns": 1 }, + { + "arguments": { + "data": [ + 1, + 2, + 3, + 2, + 3 + ], + "value": 2, + "reverse": true + }, + "returns": 3 + }, { "arguments": { "data": [ @@ -139,7 +164,7 @@ { "rel": "example", "type": "application/json", - "href": "https://processes.openeo.org/1.1.0/examples/array_find_nodata.json", + "href": "https://processes.openeo.org/1.2.0/examples/array_find_nodata.json", "title": "Find no-data values in arrays" } ] diff --git a/array_labels.json b/array_labels.json index 52dcad0e..e71ae1e9 100644 --- a/array_labels.json +++ b/array_labels.json @@ -1,25 +1,21 @@ { "id": "array_labels", "summary": "Get the labels for an array", - "description": "Gives all labels for a labeled array. The labels have the same order as in the array.", + "description": "Gives all labels for a labeled array or gives all indices for an array without labels. If the array is not labeled, an array with the zero-based indices is returned. The labels or indices have the same order as in the array.", "categories": [ "arrays" ], "parameters": [ { "name": "data", - "description": "An array with labels.", + "description": "An array.", "schema": { - "type": "array", - "subtype": "labeled-array", - "items": { - "description": "Any data type." - } + "type": "array" } } ], "returns": { - "description": "The labels as an array.", + "description": "The labels or indices as array.", "schema": { "type": "array", "items": { diff --git a/count.json b/count.json index ed1044bd..c2de1451 100644 --- a/count.json +++ b/count.json @@ -4,6 +4,7 @@ "description": "Gives the number of elements in an array that matches the specified condition.\n\n**Remarks:**\n\n* Counts the number of valid elements by default (`condition` is set to `null`). A valid element is every element for which ``is_valid()`` returns `true`.\n* To count all elements in a list set the `condition` parameter to boolean `true`.", "categories": [ "arrays", + "math > statistics", "reducer" ], "parameters": [ diff --git a/extrema.json b/extrema.json index e2503f04..6f6075de 100644 --- a/extrema.json +++ b/extrema.json @@ -3,7 +3,7 @@ "summary": "Minimum and maximum values", "description": "Two element array containing the minimum and the maximum values of `data`.\n\nThis process is basically an alias for calling both ``min()`` and ``max()``, but may be implemented more performant by back-ends as it only needs to iterate over the data once instead of twice.", "categories": [ - "math" + "math > statistics" ], "parameters": [ { diff --git a/filter_temporal.json b/filter_temporal.json index a94366a3..bd7ea0b3 100644 --- a/filter_temporal.json +++ b/filter_temporal.json @@ -1,6 +1,6 @@ { "id": "filter_temporal", - "summary": "Temporal filter for a temporal intervals", + "summary": "Temporal filter based on temporal intervals", "description": "Limits the data cube to the specified interval of dates and/or times.\n\nMore precisely, the filter checks whether each of the temporal dimension labels is greater than or equal to the lower boundary (start date/time) and less than the value of the upper boundary (end date/time). This corresponds to a left-closed interval, which contains the lower boundary but not the upper boundary.", "categories": [ "cubes", diff --git a/is_nodata.json b/is_nodata.json index 0b8b38d3..f975bb7d 100644 --- a/is_nodata.json +++ b/is_nodata.json @@ -1,7 +1,7 @@ { "id": "is_nodata", - "summary": "Value is not a no-data value", - "description": "Checks whether the specified data is a missing data, i.e. equals to `null` or any of the no-data values specified in the metadata. The special numerical value `NaN` (not a number) as defined by the [IEEE Standard 754](https://ieeexplore.ieee.org/document/4610935) is not considered no-data and must return `false`.", + "summary": "Value is a no-data value", + "description": "Checks whether the specified data is missing data, i.e. equals to `null` or any of the no-data values specified in the metadata. The special numerical value `NaN` (not a number) as defined by the [IEEE Standard 754](https://ieeexplore.ieee.org/document/4610935) is not considered no-data and must return `false`.", "categories": [ "comparison" ], diff --git a/load_collection.json b/load_collection.json index 6a0a080b..dfdb72ca 100644 --- a/load_collection.json +++ b/load_collection.json @@ -1,7 +1,7 @@ { "id": "load_collection", "summary": "Load a collection", - "description": "Loads a collection from the current back-end by its id and returns it as a processable data cube. The data that is added to the data cube can be restricted with the additional `spatial_extent`, `temporal_extent`, `bands` and `properties`.\n\n**Remarks:**\n\n* The bands (and all dimensions that specify nominal dimension labels) are expected to be ordered as specified in the metadata if the `bands` parameter is set to `null`.\n* If no additional parameter is specified this would imply that the whole data set is expected to be loaded. Due to the large size of many data sets, this is not recommended and may be optimized by back-ends to only load the data that is actually required after evaluating subsequent processes such as filters. This means that the pixel values should be processed only after the data has been limited to the required extent and as a consequence also to a manageable size.", + "description": "Loads a collection from the current back-end by its id and returns it as a processable data cube. The data that is added to the data cube can be restricted with the parameters `spatial_extent`, `temporal_extent`, `bands` and `properties`.\n\n**Remarks:**\n\n* The bands (and all dimensions that specify nominal dimension labels) are expected to be ordered as specified in the metadata if the `bands` parameter is set to `null`.\n* If no additional parameter is specified this would imply that the whole data set is expected to be loaded. Due to the large size of many data sets, this is not recommended and may be optimized by back-ends to only load the data that is actually required after evaluating subsequent processes such as filters. This means that the pixel values should be processed only after the data has been limited to the required extent and as a consequence also to a manageable size.", "categories": [ "cubes", "import" @@ -177,7 +177,7 @@ }, { "name": "properties", - "description": "Limits the data by metadata properties to include only data in the data cube which all given conditions return `true` for (AND operation).\n\nSpecify key-value-pairs with the key being the name of the metadata property, which can be retrieved with the openEO Data Discovery for Collections. The value must a condition (user-defined process) to be evaluated against the collection metadata, see the example.", + "description": "Limits the data by metadata properties to include only data in the data cube which all given conditions return `true` for (AND operation).\n\nSpecify key-value-pairs with the key being the name of the metadata property, which can be retrieved with the openEO Data Discovery for Collections. The value must be a condition (user-defined process) to be evaluated against the collection metadata, see the example.", "schema": [ { "type": "object", diff --git a/max.json b/max.json index 5a5b7f71..d8903df0 100644 --- a/max.json +++ b/max.json @@ -4,6 +4,7 @@ "description": "Computes the largest value of an array of numbers, which is equal to the first element of a sorted (i.e., ordered) version of the array.\n\nAn array without non-`null` elements resolves always with `null`.", "categories": [ "math", + "math > statistics", "reducer" ], "parameters": [ diff --git a/mean.json b/mean.json index 31a5ea42..cb45faa4 100644 --- a/mean.json +++ b/mean.json @@ -3,7 +3,7 @@ "summary": "Arithmetic mean (average)", "description": "The arithmetic mean of an array of numbers is the quantity commonly called the average. It is defined as the sum of all elements divided by the number of elements.\n\nAn array without non-`null` elements resolves always with `null`.", "categories": [ - "math", + "math > statistics", "reducer" ], "parameters": [ diff --git a/median.json b/median.json index 3bc87d5a..e54deb84 100644 --- a/median.json +++ b/median.json @@ -3,7 +3,7 @@ "summary": "Statistical median", "description": "The statistical median of an array of numbers is the value separating the higher half from the lower half of the data.\n\nAn array without non-`null` elements resolves always with `null`.\n\n**Remarks:**\n\n* For symmetric arrays, the result is equal to the ``mean()``.\n* The median can also be calculated by computing the ``quantiles()`` with a probability of *0.5*.", "categories": [ - "math", + "math > statistics", "reducer" ], "parameters": [ diff --git a/meta/implementation.md b/meta/implementation.md index af8ac782..864e02d1 100644 --- a/meta/implementation.md +++ b/meta/implementation.md @@ -141,3 +141,88 @@ To make `date_shift` easier to implement, we have found some libraries that foll - JavaScript: [Moment.js](https://momentjs.com/) - Python: [dateutil](https://dateutil.readthedocs.io/en/stable/index.html) - R: [lubridate](https://lubridate.tidyverse.org/) ([Cheatsheet](https://rawgit.com/rstudio/cheatsheets/master/lubridate.pdf)) + +## `inspect` process + +The `inspect` process (previously known as `debug`) is a process to allow users to debug their workflows. +Back-ends should not execute the processes for log levels that are not matching the mininum log level that can be specified through the API (>= v1.2.0) for each data processing request. + +### Data Types + +The process is only useful for users if a common behavior for data types passed into the `data` parameter has been agreed on across implementations. + +The following chapters include some proposals for common data (sub) types, but it is incomplete and will be extended in the future. +Also, for some data types a JSON encoding is missing, we'll add more details once agreed upon: + + +#### Scalars +For the data types boolean, numbers, strings and null it is recommended to log them as given. + +#### Arrays + +It is recommended to summarize arrays with as follows: +```js +{ + "data": [3,1,6,4,8], // Return a reasonable excerpt of the data, e.g. the first 5 or 10 elements + "length": 10, // Return the length of the array, this is important to determine whether the data above is complete or an excerpt + "min": 0, // optional: Return additional statstics if possible, ideally use the corresponsing openEO process names as keys + "max": 10 +} +``` + +#### Data Cubes + +It is recommended to return them summarized in a structure compliant to the [STAC data cube extension](https://github.com/stac-extensions/datacube). +If reasonsable, it gives a valuable benefit for users to provide all dimension labels (e.g. individual timestamps for the temporal dimension) instead of values ranges. +The top-level object and/or each dimension can be enhanced with additional statstics if possible, ideally use the corresponsing openEO process names as keys. + +```js +{ + "cube:dimensions": { + "x": { + "type": "spatial", + "axis": "x", + "extent": [8.253, 12.975], + "reference_system": 4326 + }, + "y": { + "type": "spatial", + "axis": "y", + "extent": [51.877,55.988], + "reference_system": 4326 + }, + "t": { + "type": "temporal", + "values": [ + "2015-06-21T12:56:55Z", + "2015-06-23T09:12:14Z", + "2015-06-25T23:44:44Z", + "2015-06-27T21:11:34Z", + "2015-06-30T17:33:12Z" + ], + "step": null + }, + "bands": { + "type": "bands", + "values": ["NDVI"] + } + }, + // optional: Return additional statstics for the data cube if possible, ideally use the corresponsing openEO process names as keys + "min": -1, + "max": 1 +} +``` + +## Quantile algorithms + +The `quantiles` could implement a number of different algorithms, literature usually distinguishes [9 types](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample). +Right now it's not possible to choose from them, but it may be added in the future. +To improve interoperability openEO processes, version 1.2.0 added details about the algorithm that must be implemented. +A survey has shown that most libraries implement type 7 and as such this was chosen to be the default. + +We have found some libraries that can be used for an implementation: +- Java: [Apache Commons Math Percentile](http://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/descriptive/rank/Percentile.html), choose the [estimation type `R_7`](http://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html#R_7) +- JavaScript: [d3](https://github.com/d3/d3-array/blob/v2.8.0/README.md#quantile), has only type 7 implemented. +- Julia: [Statistics.quantile](https://docs.julialang.org/en/v1/stdlib/Statistics/#Statistics.quantile!), type 7 is the default. +- Python: [numpy](https://numpy.org/doc/stable/reference/generated/numpy.quantile.html), [pandas](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html), [xarray](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.quantile.html) - type 7 (called 'linear' for the interpolation parameter) is the default for all of them. +- R: [quantile](https://stat.ethz.ch/R-manual/R-patched/library/stats/html/quantile.html) - type 7 is the default. diff --git a/meta/subtype-schemas.json b/meta/subtype-schemas.json index fbbdb867..429252ef 100644 --- a/meta/subtype-schemas.json +++ b/meta/subtype-schemas.json @@ -1,6 +1,6 @@ { "$schema": "http://json-schema.org/draft-07/schema#", - "$id": "http://processes.openeo.org/1.1.0/meta/subtype-schemas.json", + "$id": "http://processes.openeo.org/1.2.0/meta/subtype-schemas.json", "title": "Subtype Schemas", "description": "This file defines the schemas for subtypes we define for openEO processes.", "definitions": { @@ -74,6 +74,8 @@ "chunk-size": { "type": "object", "subtype": "chunk-size", + "title": "Chunk Size", + "description": "The chunk size per dimension given. This object maps the dimension names given as key to chunks given as either a physical measure or pixels. If not given or `null`, no chunking is applied.", "required": [ "dimension", "value" @@ -107,7 +109,7 @@ "type": "string", "subtype": "collection-id", "title": "Collection ID", - "description": "A collection id from the list of supported collections.", + "description": "A collection identifier from the list of supported collections.", "pattern": "^[\\w\\-\\.~/]+$" }, "date": { @@ -127,6 +129,7 @@ "duration": { "type": "string", "subtype": "duration", + "title": "Duration", "description": "[ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations), e.g. `P1D` for one day.", "pattern": "^(-?)P(?=\\d|T\\d)(?:(\\d+)Y)?(?:(\\d+)M)?(?:(\\d+)([DW]))?(?:T(?:(\\d+)H)?(?:(\\d+)M)?(?:(\\d+(?:\\.\\d+)?)S)?)?$" }, @@ -171,7 +174,7 @@ "type": "string", "subtype": "input-format", "title": "Input File Format", - "description": "An input format supported by the back-end." + "description": "A file format that the back-end supports to import data from." }, "input-format-options": { "type": "object", @@ -190,7 +193,7 @@ "type": "array", "subtype": "kernel", "title": "Image Kernel", - "description": "Image kernel, a two-dimensional array of numbers.", + "description": "A two-dimensional array of numbers to be used as kernel for the image operation.", "items": { "type": "array", "items": { @@ -236,7 +239,7 @@ "type": "string", "subtype": "output-format", "title": "Output File Format", - "description": "An output format supported by the back-end." + "description": "A file format that the back-end supports to save and export data to." }, "output-format-options": { "type": "object", @@ -389,13 +392,13 @@ "type": "string", "subtype": "udf-runtime", "title": "UDF runtime", - "description": "The name of a UDF runtime." + "description": "The identifier of a UDF runtime you want to run the given UDF source code with." }, "udf-runtime-version": { "type": "string", "subtype": "udf-runtime-version", "title": "UDF Runtime version", - "description": "The version of a UDF runtime." + "description": "The version of the UDF runtime you want to run the given UDF source code with." }, "uri": { "type": "string", diff --git a/min.json b/min.json index ce161c95..26c60882 100644 --- a/min.json +++ b/min.json @@ -4,6 +4,7 @@ "description": "Computes the smallest value of an array of numbers, which is equal to the last element of a sorted (i.e., ordered) version of the array.\n\nAn array without non-`null` elements resolves always with `null`.", "categories": [ "math", + "math > statistics", "reducer" ], "parameters": [ diff --git a/proposals/ard_normalized_radar_backscatter.json b/proposals/ard_normalized_radar_backscatter.json index 46f5f48a..e643845f 100644 --- a/proposals/ard_normalized_radar_backscatter.json +++ b/proposals/ard_normalized_radar_backscatter.json @@ -58,6 +58,16 @@ "schema": { "type": "boolean" } + }, + { + "description": "Proprietary options for the backscatter computations. Specifying proprietary options will reduce portability.", + "name": "options", + "optional": true, + "default": {}, + "schema": { + "type": "object", + "additionalProperties": false + } } ], "returns": { @@ -110,6 +120,9 @@ }, "noise_removal": { "from_parameter": "noise_removal" + }, + "options": { + "from_parameter": "options" } }, "result": true diff --git a/proposals/ard_surface_reflectance.json b/proposals/ard_surface_reflectance.json index 3fee8b54..38aa758b 100644 --- a/proposals/ard_surface_reflectance.json +++ b/proposals/ard_surface_reflectance.json @@ -4,7 +4,7 @@ "description": "Computes CARD4L compliant surface (bottom of atmosphere/top of canopy) reflectance values from optical input.", "categories": [ "cubes", - "sar", + "optical", "ard" ], "experimental": true, diff --git a/proposals/array_find_label.json b/proposals/array_find_label.json index 7f501871..98371cb5 100644 --- a/proposals/array_find_label.json +++ b/proposals/array_find_label.json @@ -1,7 +1,7 @@ { "id": "array_find_label", "summary": "Get the index for a label in a labeled array", - "description": "Checks whether the labeled array specified for `data` has the label specified in `label` and returns the zero-based index for it. If there's no match as either the label doesn't exist or the array is not labeled, `null` is returned.", + "description": "Checks whether the labeled array specified for `data` has the label specified in `label` and returns the zero-based index for it. If there's no match as either the label doesn't exist or the array is not labeled, `null` is returned.\n\nUse ``array_find()`` to find the index for a given value in the array.", "categories": [ "arrays", "reducer" diff --git a/proposals/array_interpolate_linear.json b/proposals/array_interpolate_linear.json index 3a106c77..f5fe90ec 100644 --- a/proposals/array_interpolate_linear.json +++ b/proposals/array_interpolate_linear.json @@ -52,6 +52,22 @@ -1, -8 ] + }, + { + "arguments": { + "data": [ + null, + 1, + null, + null + ] + }, + "returns": [ + null, + 1, + null, + null + ] } ], "links": [ @@ -61,4 +77,4 @@ "title": "Linear interpolation explained by Wikipedia" } ] -} \ No newline at end of file +} diff --git a/proposals/array_modify.json b/proposals/array_modify.json index 497016db..6eaa6852 100644 --- a/proposals/array_modify.json +++ b/proposals/array_modify.json @@ -1,7 +1,7 @@ { "id": "array_modify", - "summary": "Change the content of an array (insert, remove, update)", - "description": "Allows to insert into, remove from or update an array.\n\nAll labels get discarded and the array indices are always a sequence of numbers with the step size of 1 and starting at 0.", + "summary": "Change the content of an array (remove, insert, update)", + "description": "Modify an array by removing, inserting or updating elements. Updating can be seen as removing elements followed by inserting new elements (not necessarily the same number).\n\nAll labels get discarded and the array indices are always a sequence of numbers with the step size of 1 and starting at 0.", "categories": [ "arrays" ], @@ -9,7 +9,7 @@ "parameters": [ { "name": "data", - "description": "An array.", + "description": "The array to modify.", "schema": { "type": "array", "items": { @@ -19,7 +19,7 @@ }, { "name": "values", - "description": "The values to fill the array with.", + "description": "The values to insert into the `data` array.", "schema": { "type": "array", "items": { @@ -29,7 +29,7 @@ }, { "name": "index", - "description": "The index of the element to insert the value(s) before. If the index is greater than the number of elements, the process throws an `ArrayElementNotAvailable` exception.\n\nTo insert after the last element, there are two options:\n\n1. Use the simpler processes ``array_append()`` to append a single value or ``array_concat`` to append multiple values.\n2. Specify the number of elements in the array. You can retrieve the number of elements with the process ``count()``, having the parameter `condition` set to `true`.", + "description": "The index in the `data` array of the element to insert the value(s) before. If the index is greater than the number of elements in the `data` array, the process throws an `ArrayElementNotAvailable` exception.\n\nTo insert after the last element, there are two options:\n\n1. Use the simpler processes ``array_append()`` to append a single value or ``array_concat()`` to append multiple values.\n2. Specify the number of elements in the array. You can retrieve the number of elements with the process ``count()``, having the parameter `condition` set to `true`.", "schema": { "type": "integer", "minimum": 0 @@ -37,7 +37,7 @@ }, { "name": "length", - "description": "The number of elements to replace. This parameter has no effect in case the given `index` does not exist in the array given.", + "description": "The number of elements in the `data` array to remove (or replace) starting from the given index. If the array contains fewer elements, the process simply removes all elements up to the end.", "optional": true, "default": 1, "schema": { @@ -57,7 +57,7 @@ }, "exceptions": { "ArrayElementNotAvailable": { - "message": "The array has no element with the specified index." + "message": "The array can't be modified as the given index is larger than the number of elements in the array." } }, "examples": [ @@ -124,26 +124,6 @@ "c" ] }, - { - "description": "Add a value at a specific non-existing position after the array, fill missing elements with `null`.", - "arguments": { - "data": [ - "a", - "b" - ], - "values": [ - "e" - ], - "index": 4 - }, - "returns": [ - "a", - "b", - null, - null, - "e" - ] - }, { "description": "Remove a single value from the array.", "arguments": { @@ -181,6 +161,22 @@ "b", "c" ] + }, + { + "description": "Remove multiple values from the end of the array and ignore that the given length is exceeding the size of the array.", + "arguments": { + "data": [ + "a", + "b", + "c" + ], + "values": [], + "index": 1, + "length": 10 + }, + "returns": [ + "a" + ] } ] -} \ No newline at end of file +} diff --git a/proposals/fit_curve.json b/proposals/fit_curve.json new file mode 100644 index 00000000..3b5df7e1 --- /dev/null +++ b/proposals/fit_curve.json @@ -0,0 +1,94 @@ +{ + "id": "fit_curve", + "summary": "Curve fitting", + "description": "Use non-linear least squares to fit a model function `y = f(x, parameters)` to data.\n\nThe process throws an `InvalidValues` exception if invalid values are encountered. Invalid values are finite numbers (see also ``is_valid()``).", + "categories": [ + "cubes", + "math" + ], + "experimental": true, + "parameters": [ + { + "name": "data", + "description": "A data cube.", + "schema": { + "type": "object", + "subtype": "raster-cube" + } + }, + { + "name": "parameters", + "description": "Defined the number of parameters for the model function and provides an initial guess for them. At least one parameter is required.", + "schema": [ + { + "type": "array", + "minItems": 1, + "items": { + "type": "number" + } + }, + { + "title": "Data Cube with optimal values from a previous result of this process.", + "type": "object", + "subtype": "raster-cube" + } + ] + }, + { + "name": "function", + "description": "The model function. It must take the parameters to fit as array through the first argument and the independent variable `x` as the second argument.\n\nIt is recommended to store the model function as a user-defined process on the back-end to be able to re-use the model function with the computed optimal values for the parameters afterwards.", + "schema": { + "type": "object", + "subtype": "process-graph", + "parameters": [ + { + "name": "x", + "description": "The value for the independent variable `x`.", + "schema": { + "type": "number" + } + }, + { + "name": "parameters", + "description": "The parameters for the model function, contains at least one parameter.", + "schema": { + "type": "array", + "minItems": 1, + "items": { + "type": "number" + } + } + } + ], + "returns": { + "description": "The computed value `y` value for the given independent variable `x` and the parameters.", + "schema": { + "type": "number" + } + } + } + }, + { + "name": "dimension", + "description": "The name of the dimension for curve fitting. Must be a dimension with labels that have a order (i.e. numerical labels or a temporal dimension). Fails with a `DimensionNotAvailable` exception if the specified dimension does not exist.", + "schema": { + "type": "string" + } + } + ], + "returns": { + "description": "A data cube with the optimal values for the parameters.", + "schema": { + "type": "object", + "subtype": "raster-cube" + } + }, + "exceptions": { + "InvalidValues": { + "message": "At least one of the values is not a finite number." + }, + "DimensionNotAvailable": { + "message": "A dimension with the specified name does not exist." + } + } +} \ No newline at end of file diff --git a/proposals/debug.json b/proposals/inspect.json similarity index 57% rename from proposals/debug.json rename to proposals/inspect.json index cf902efd..91d6deb6 100644 --- a/proposals/debug.json +++ b/proposals/inspect.json @@ -1,7 +1,7 @@ { - "id": "debug", - "summary": "Publish debugging information", - "description": "Sends debugging information about the data to the log output. Passes the data through.", + "id": "inspect", + "summary": "Add information to the logs", + "description": "This process can be used to add runtime information to the logs, e.g. for debugging purposes. This process should be used with caution and it is recommended to remove the process in production workflows. For example, logging each pixel or array individually in a process such as ``apply()`` or ``reduce_dimension()`` could lead to a (too) large number of log entries. Several data structures (e.g. data cubes) are too large to log and will only return summaries of their contents.\n\nThe data provided in the parameter `data` is returned without changes.", "categories": [ "development" ], @@ -9,23 +9,23 @@ "parameters": [ { "name": "data", - "description": "Data to publish.", + "description": "Data to log.", "schema": { "description": "Any data type is allowed." } }, { "name": "code", - "description": "An identifier to help identify the log entry in a bunch of other log entries.", + "description": "A label to help identify one or more log entries originating from this process in the list of all log entries. It can help to group or filter log entries and is usually not unique.", "schema": { "type": "string" }, - "default": "", + "default": "User", "optional": true }, { "name": "level", - "description": "The severity level of this message, defaults to `info`. Note that the level `error` forces the computation to be stopped!", + "description": "The severity level of this message, defaults to `info`.", "schema": { "type": "string", "enum": [ diff --git a/proposals/load_result.json b/proposals/load_result.json index ebc81718..d8b70c6a 100644 --- a/proposals/load_result.json +++ b/proposals/load_result.json @@ -1,7 +1,7 @@ { "id": "load_result", "summary": "Load batch job results", - "description": "Loads batch job results by job id from the server-side user workspace. The job must have been stored by the authenticated user on the back-end currently connected to.", + "description": "Loads batch job results and returns them as a processable data cube. A batch job result can be loaded by ID or URL:\n\n* **ID**: The identifier for a finished batch job. The job must have been submitted by the authenticated user on the back-end currently connected to.\n* **URL**: The URL to the STAC metadata for a batch job result. This is usually a signed URL that is provided by some back-ends since openEO API version 1.1.0 through the `canonical` link relation in the batch job result metadata.\n\nIf supported by the underlying metadata and file format, the data that is added to the data cube can be restricted with the parameters `spatial_extent`, `temporal_extent` and `bands`.\n\n**Remarks:**\n\n* The bands (and all dimensions that specify nominal dimension labels) are expected to be ordered as specified in the metadata if the `bands` parameter is set to `null`.\n* If no additional parameter is specified this would imply that the whole data set is expected to be loaded. Due to the large size of many data sets, this is not recommended and may be optimized by back-ends to only load the data that is actually required after evaluating subsequent processes such as filters. This means that the pixel values should be processed only after the data has been limited to the required extent and as a consequence also to a manageable size.", "categories": [ "cubes", "import" @@ -11,11 +11,184 @@ { "name": "id", "description": "The id of a batch job with results.", - "schema": { - "type": "string", - "subtype": "job-id", - "pattern": "^[\\w\\-\\.~]+$" - } + "schema": [ + { + "title": "ID", + "type": "string", + "subtype": "job-id", + "pattern": "^[\\w\\-\\.~]+$" + }, + { + "title": "URL", + "type": "string", + "format": "uri", + "subtype": "uri", + "pattern": "^https?://" + } + ] + }, + { + "name": "spatial_extent", + "description": "Limits the data to load from the batch job result to the specified bounding box or polygons.\n\nThe process puts a pixel into the data cube if the point at the pixel center intersects with the bounding box or any of the polygons (as defined in the Simple Features standard by the OGC).\n\nThe GeoJSON can be one of the following feature types:\n\n* A `Polygon` or `MultiPolygon` geometry,\n* a `Feature` with a `Polygon` or `MultiPolygon` geometry,\n* a `FeatureCollection` containing at least one `Feature` with `Polygon` or `MultiPolygon` geometries, or\n* a `GeometryCollection` containing `Polygon` or `MultiPolygon` geometries. To maximize interoperability, `GeometryCollection` should be avoided in favour of one of the alternatives above.\n\nSet this parameter to `null` to set no limit for the spatial extent. Be careful with this when loading large datasets! It is recommended to use this parameter instead of using ``filter_bbox()`` or ``filter_spatial()`` directly after loading unbounded data.", + "schema": [ + { + "title": "Bounding Box", + "type": "object", + "subtype": "bounding-box", + "required": [ + "west", + "south", + "east", + "north" + ], + "properties": { + "west": { + "description": "West (lower left corner, coordinate axis 1).", + "type": "number" + }, + "south": { + "description": "South (lower left corner, coordinate axis 2).", + "type": "number" + }, + "east": { + "description": "East (upper right corner, coordinate axis 1).", + "type": "number" + }, + "north": { + "description": "North (upper right corner, coordinate axis 2).", + "type": "number" + }, + "base": { + "description": "Base (optional, lower left corner, coordinate axis 3).", + "type": [ + "number", + "null" + ], + "default": null + }, + "height": { + "description": "Height (optional, upper right corner, coordinate axis 3).", + "type": [ + "number", + "null" + ], + "default": null + }, + "crs": { + "description": "Coordinate reference system of the extent, specified as as [EPSG code](http://www.epsg-registry.org/), [WKT2 (ISO 19162) string](http://docs.opengeospatial.org/is/18-010r7/18-010r7.html) or [PROJ definition (deprecated)](https://proj.org/usage/quickstart.html). Defaults to `4326` (EPSG code 4326) unless the client explicitly requests a different coordinate reference system.", + "anyOf": [ + { + "title": "EPSG Code", + "type": "integer", + "subtype": "epsg-code", + "minimum": 1000, + "examples": [ + 3857 + ] + }, + { + "title": "WKT2", + "type": "string", + "subtype": "wkt2-definition" + }, + { + "title": "PROJ definition", + "type": "string", + "subtype": "proj-definition", + "deprecated": true + } + ], + "default": 4326 + } + } + }, + { + "title": "GeoJSON", + "description": "Limits the data cube to the bounding box of the given geometry. All pixels inside the bounding box that do not intersect with any of the polygons will be set to no data (`null`).", + "type": "object", + "subtype": "geojson" + }, + { + "title": "No filter", + "description": "Don't filter spatially. All data is included in the data cube.", + "type": "null" + } + ], + "default": null, + "optional": true + }, + { + "name": "temporal_extent", + "description": "Limits the data to load from the batch job result to the specified left-closed temporal interval. Applies to all temporal dimensions. The interval has to be specified as an array with exactly two elements:\n\n1. The first element is the start of the temporal interval. The specified instance in time is **included** in the interval.\n2. The second element is the end of the temporal interval. The specified instance in time is **excluded** from the interval.\n\nThe specified temporal strings follow [RFC 3339](https://www.rfc-editor.org/rfc/rfc3339.html). Also supports open intervals by setting one of the boundaries to `null`, but never both.\n\nSet this parameter to `null` to set no limit for the temporal extent. Be careful with this when loading large datasets! It is recommended to use this parameter instead of using ``filter_temporal()`` directly after loading unbounded data.", + "schema": [ + { + "type": "array", + "subtype": "temporal-interval", + "minItems": 2, + "maxItems": 2, + "items": { + "anyOf": [ + { + "type": "string", + "format": "date-time", + "subtype": "date-time" + }, + { + "type": "string", + "format": "date", + "subtype": "date" + }, + { + "type": "string", + "subtype": "year", + "minLength": 4, + "maxLength": 4, + "pattern": "^\\d{4}$" + }, + { + "type": "null" + } + ] + }, + "examples": [ + [ + "2015-01-01T00:00:00Z", + "2016-01-01T00:00:00Z" + ], + [ + "2015-01-01", + "2016-01-01" + ] + ] + }, + { + "title": "No filter", + "description": "Don't filter temporally. All data is included in the data cube.", + "type": "null" + } + ], + "default": null, + "optional": true + }, + { + "name": "bands", + "description": "Only adds the specified bands into the data cube so that bands that don't match the list of band names are not available. Applies to all dimensions of type `bands`.\n\nEither the unique band name (metadata field `name` in bands) or one of the common band names (metadata field `common_name` in bands) can be specified. If the unique band name and the common name conflict, the unique band name has a higher priority.\n\nThe order of the specified array defines the order of the bands in the data cube. If multiple bands match a common name, all matched bands are included in the original order.\n\nIt is recommended to use this parameter instead of using ``filter_bands()`` directly after loading unbounded data.", + "schema": [ + { + "type": "array", + "items": { + "type": "string", + "subtype": "band-name" + } + }, + { + "title": "No filter", + "description": "Don't filter bands. All bands are included in the data cube.", + "type": "null" + } + ], + "default": null, + "optional": true } ], "returns": { diff --git a/proposals/predict_curve.json b/proposals/predict_curve.json new file mode 100644 index 00000000..52adcc5e --- /dev/null +++ b/proposals/predict_curve.json @@ -0,0 +1,112 @@ +{ + "id": "predict_curve", + "summary": "Predict values", + "description": "Predict values using a model function and pre-computed parameters. The process is primarily intended to compute values for new labels, but it can also fill gaps where existing labels contain no-data (`null`) values.", + "categories": [ + "cubes", + "math" + ], + "experimental": true, + "parameters": [ + { + "name": "data", + "description": "A data cube to predict values for.", + "schema": { + "type": "object", + "subtype": "raster-cube" + } + }, + { + "name": "parameters", + "description": "A data cube with optimal values from a result of e.g. ``fit_curve()``.", + "schema": { + "type": "object", + "subtype": "raster-cube" + } + }, + { + "name": "function", + "description": "The model function. It must take the parameters to fit as array through the first argument and the independent variable `x` as the second argument.\n\nIt is recommended to store the model function as a user-defined process on the back-end.", + "schema": { + "type": "object", + "subtype": "process-graph", + "parameters": [ + { + "name": "x", + "description": "The value for the independent variable `x`.", + "schema": { + "type": "number" + } + }, + { + "name": "parameters", + "description": "The parameters for the model function, contains at least one parameter.", + "schema": { + "type": "array", + "minItems": 1, + "items": { + "type": "number" + } + } + } + ], + "returns": { + "description": "The computed value `y` value for the given independent variable `x` and the parameters.", + "schema": { + "type": "number" + } + } + } + }, + { + "name": "dimension", + "description": "The name of the dimension for predictions. Fails with a `DimensionNotAvailable` exception if the specified dimension does not exist.", + "schema": { + "type": "string" + } + }, + { + "name": "labels", + "description": "The labels to predict values for. If no labels are given, predicts values only for no-data (`null`) values in the data cube.", + "optional": true, + "default": null, + "schema": [ + { + "type": "null" + }, + { + "type": "array", + "items": { + "anyOf": [ + { + "type": "number" + }, + { + "type": "string", + "format": "date", + "subtype": "date" + }, + { + "type": "string", + "format": "date-time", + "subtype": "date-time" + } + ] + } + } + ] + } + ], + "returns": { + "description": "A data cube with the predicted values.", + "schema": { + "type": "object", + "subtype": "raster-cube" + } + }, + "exceptions": { + "DimensionNotAvailable": { + "message": "A dimension with the specified name does not exist." + } + } +} \ No newline at end of file diff --git a/proposals/run_udf_externally.json b/proposals/run_udf_externally.json index 521f7bef..9672eb71 100644 --- a/proposals/run_udf_externally.json +++ b/proposals/run_udf_externally.json @@ -1,7 +1,7 @@ { "id": "run_udf_externally", "summary": "Run an externally hosted UDF container", - "description": "Runs a compatible UDF container that is either externally hosted by a service provider or running on a local machine of the user. The UDF container must follow the [openEO UDF specification](https://openeo.org/documentation/1.0/udfs.html).\n\nThe referenced UDF service can be executed in several processes such as ``aggregate_spatial()``, ``apply()``, ``apply_dimension()`` and ``reduce_dimension()``. In this case, an array is passed instead of a raster data cube. The user must ensure that the data is properly passed as an array so that the UDF can make sense of it.", + "description": "Runs a compatible UDF container that is either externally hosted by a service provider or running on a local machine of the user. The UDF container must follow the [openEO UDF specification](https://openeo.org/documentation/1.0/udfs.html).\n\nThe referenced UDF service can be executed in several processes such as ``aggregate_spatial()``, ``apply()``, ``apply_dimension()`` and ``reduce_dimension()``. In this case, an array is passed instead of a raster data cube. The user must ensure that the data is given in a way that the UDF code can make sense of it.", "categories": [ "cubes", "import", @@ -11,13 +11,8 @@ "parameters": [ { "name": "data", - "description": "The data to be passed to the UDF as an array or raster data cube.", + "description": "The data to be passed to the UDF.", "schema": [ - { - "title": "Raster data cube", - "type": "object", - "subtype": "raster-cube" - }, { "title": "Array", "type": "array", @@ -39,7 +34,7 @@ "type": "string", "format": "uri", "subtype": "uri", - "pattern": "^(http|https)://" + "pattern": "^https?://" } }, { @@ -53,18 +48,11 @@ } ], "returns": { - "description": "The data processed by the UDF service.\n\n* Returns a raster data cube if a raster data cube is passed for `data`. Details on the dimensions and dimension properties (name, type, labels, reference system and resolution) depend on the UDF.\n* If an array is passed for `data`, the returned value can be of any data type, but is exactly what the UDF returns.", - "schema": [ - { - "title": "Raster data cube", - "type": "object", - "subtype": "raster-cube" - }, - { - "title": "Any", - "description": "Any data type." - } - ] + "description": "The data processed by the UDF. The returned value can in principle be of any data type, but it depends on what is returned by the UDF code. Please see the implemented UDF interface for details.", + "schema": { + "title": "Any", + "description": "Any data type." + } }, "links": [ { diff --git a/proposals/sar_backscatter.json b/proposals/sar_backscatter.json index 07945438..77fdf73e 100644 --- a/proposals/sar_backscatter.json +++ b/proposals/sar_backscatter.json @@ -97,6 +97,16 @@ "schema": { "type": "boolean" } + }, + { + "description": "Proprietary options for the backscatter computations. Specifying proprietary options will reduce portability.", + "name": "options", + "optional": true, + "default": {}, + "schema": { + "type": "object", + "additionalProperties": false + } } ], "returns": { diff --git a/quantiles.json b/quantiles.json index 35079b86..81f60c2b 100644 --- a/quantiles.json +++ b/quantiles.json @@ -1,9 +1,9 @@ { "id": "quantiles", "summary": "Quantiles", - "description": "Calculates quantiles, which are cut points dividing the range of a probability distribution into either\n\n* intervals corresponding to the given `probabilities` or\n* (nearly) equal-sized intervals (q-quantiles based on the parameter `q`).\n\nEither the parameter `probabilities` or `q` must be specified, otherwise the `QuantilesParameterMissing` exception is thrown. If both parameters are set the `QuantilesParameterConflict` exception is thrown.", + "description": "Calculates quantiles, which are cut points dividing the range of a sample distribution into either\n\n* intervals corresponding to the given `probabilities` or\n* equal-sized intervals (q-quantiles based on the parameter `q`).\n\nEither the parameter `probabilities` or `q` must be specified, otherwise the `QuantilesParameterMissing` exception is thrown. If both parameters are set the `QuantilesParameterConflict` exception is thrown.\n\nSample quantiles can be computed with several different algorithms. Hyndman and Fan (1996) have concluded on nine different types, which are commonly implemented in statistical software packages. This process is implementing type 7, which is implemented widely and often also the default type (e.g. in Excel, Julia, Python, R and S).", "categories": [ - "math" + "math > statistics" ], "parameters": [ { @@ -21,7 +21,7 @@ }, { "name": "probabilities", - "description": "A list of probabilities to calculate quantiles for. The probabilities must be between 0 and 1.", + "description": "A list of probabilities to calculate quantiles for. The probabilities must be between 0 and 1 (inclusive).", "schema": { "type": "array", "items": { @@ -34,7 +34,7 @@ }, { "name": "q", - "description": "Intervals to calculate quantiles for. Calculates q-quantiles with (nearly) equal-sized intervals.", + "description": "Number of intervals to calculate quantiles for. Calculates q-quantiles with equal-sized intervals.", "schema": { "type": "integer", "minimum": 2 @@ -173,6 +173,12 @@ "rel": "about", "href": "https://en.wikipedia.org/wiki/Quantile", "title": "Quantiles explained by Wikipedia" + }, + { + "rel": "about", + "href": "https://www.amherst.edu/media/view/129116/original/Sample+Quantiles.pdf", + "type": "application/pdf", + "title": "Hyndman and Fan (1996): Sample Quantiles in Statistical Packages" } ] } \ No newline at end of file diff --git a/rename_labels.json b/rename_labels.json index 6ed32f6f..e1ec9138 100644 --- a/rename_labels.json +++ b/rename_labels.json @@ -97,7 +97,7 @@ { "rel": "example", "type": "application/json", - "href": "https://processes.openeo.org/1.1.0/examples/rename-enumerated-labels.json", + "href": "https://processes.openeo.org/1.2.0/examples/rename-enumerated-labels.json", "title": "Rename enumerated labels" } ] diff --git a/run_udf.json b/run_udf.json index 3586f06c..f65f850c 100644 --- a/run_udf.json +++ b/run_udf.json @@ -1,7 +1,7 @@ { "id": "run_udf", "summary": "Run a UDF", - "description": "Runs a UDF in one of the supported runtime environments.\n\nThe process can either:\n\n1. load and run a UDF stored in a file on the server-side workspace of the authenticated user. The path to the UDF file must be relative to the root directory of the user's workspace.\n2. fetch and run a remotely stored and published UDF by absolute URI.\n3. run the source code specified inline as string.\n\nThe loaded UDF can be executed in several processes such as ``aggregate_spatial()``, ``apply()``, ``apply_dimension()`` and ``reduce_dimension()``. In this case, an array is passed instead of a raster data cube. The user must ensure that the data is properly passed as an array so that the UDF can make sense of it.", + "description": "Runs a UDF in one of the supported runtime environments.\n\nThe process can either:\n\n1. load and run a UDF stored in a file on the server-side workspace of the authenticated user. The path to the UDF file must be relative to the root directory of the user's workspace.\n2. fetch and run a remotely stored and published UDF by absolute URI.\n3. run the source code specified inline as string.\n\nThe loaded UDF can be executed in several processes such as ``aggregate_spatial()``, ``apply()``, ``apply_dimension()`` and ``reduce_dimension()``. The user must ensure that the data is given in a way that the UDF code can make sense of it.", "categories": [ "cubes", "import", @@ -10,13 +10,8 @@ "parameters": [ { "name": "data", - "description": "The data to be passed to the UDF as an array or raster data cube.", + "description": "The data to be passed to the UDF.", "schema": [ - { - "title": "Raster data cube", - "type": "object", - "subtype": "raster-cube" - }, { "title": "Array", "type": "array", @@ -40,7 +35,7 @@ "type": "string", "format": "uri", "subtype": "uri", - "pattern": "^(http|https)://" + "pattern": "^https?://" }, { "description": "Path to a UDF uploaded to the server.", @@ -91,22 +86,18 @@ } ], "exceptions": { + "InvalidRuntime": { + "message": "The specified UDF runtime is not supported." + }, "InvalidVersion": { "message": "The specified UDF runtime version is not supported." } }, "returns": { - "description": "The data processed by the UDF.\n\n* Returns a raster data cube, if a raster data cube is passed for `data`. Details on the dimensions and dimension properties (name, type, labels, reference system and resolution) depend on the UDF.\n* If an array is passed for `data`, the returned value can be of any data type, but is exactly what the UDF returns.", - "schema": [ - { - "title": "Raster data cube", - "type": "object", - "subtype": "raster-cube" - }, - { - "title": "Any", - "description": "Any data type." - } - ] + "description": "The data processed by the UDF. The returned value can be of any data type and is exactly what the UDF code returns.", + "schema": { + "title": "Any", + "description": "Any data type." + } } } \ No newline at end of file diff --git a/save_result.json b/save_result.json index 72ecfaae..6f58db4a 100644 --- a/save_result.json +++ b/save_result.json @@ -1,7 +1,7 @@ { "id": "save_result", - "summary": "Save processed data to storage", - "description": "Saves processed data to the server-side user workspace of the authenticated user. This process aims to be compatible with GDAL/OGR formats and options. STAC-compatible metadata should be stored with the processed data.\n\nCalling this process may be rejected by back-ends in the context of secondary web services.", + "summary": "Save processed data", + "description": "Makes the processed data available in the given file format to the corresponding medium that is relevant for the context this processes is applied in:\n\n* For **batch jobs** the data is stored on the back-end. STAC-compatible metadata is usually made available with the processed data.\n* For **synchronous processing** the data is sent to the client as a direct response to the request.\n* **Secondary web services** are provided with the processed data so that it can make use of it (e.g., visualize it). Web service may require the data in a certain format. Please refer to the documentation of the individual service types for details.", "categories": [ "cubes", "export" @@ -9,7 +9,7 @@ "parameters": [ { "name": "data", - "description": "The data to save.", + "description": "The data to deliver in the given file format.", "schema": [ { "type": "object", @@ -23,7 +23,7 @@ }, { "name": "format", - "description": "The file format to save to. It must be one of the values that the server reports as supported output file formats, which usually correspond to the short GDAL/OGR codes. If the format is not suitable for storing the underlying data structure, a `FormatUnsuitable` exception will be thrown. This parameter is *case insensitive*.", + "description": "The file format to use. It must be one of the values that the server reports as supported output file formats, which usually correspond to the short GDAL/OGR codes. If the format is not suitable for storing the underlying data structure, a `FormatUnsuitable` exception will be thrown. This parameter is *case insensitive*.", "schema": { "type": "string", "subtype": "output-format" @@ -41,7 +41,7 @@ } ], "returns": { - "description": "`false` if saving failed, `true` otherwise.", + "description": "Returns `false` if the process failed to make the data available, `true` otherwise.", "schema": { "type": "boolean" } diff --git a/sd.json b/sd.json index 89be4f70..156e4b38 100644 --- a/sd.json +++ b/sd.json @@ -3,7 +3,7 @@ "summary": "Standard deviation", "description": "Computes the sample standard deviation, which quantifies the amount of variation of an array of numbers. It is defined to be the square root of the corresponding variance (see ``variance()``).\n\nA low standard deviation indicates that the values tend to be close to the expected value, while a high standard deviation indicates that the values are spread out over a wider range.\n\nAn array without non-`null` elements resolves always with `null`.", "categories": [ - "math", + "math > statistics", "reducer" ], "parameters": [ diff --git a/tests/.words b/tests/.words index b9fe6130..95a83c72 100644 --- a/tests/.words +++ b/tests/.words @@ -37,3 +37,4 @@ gdalwarp Lanczos sinc interpolants +Hyndman \ No newline at end of file diff --git a/tests/README.md b/tests/README.md index e2868634..31c79785 100644 --- a/tests/README.md +++ b/tests/README.md @@ -4,5 +4,32 @@ To run the tests follow these steps: 1. Install [node and npm](https://nodejs.org) - should run with any recent version 2. Run `npm install` in this folder to install the dependencies -3. Run the tests with `npm test`. -4. To show the files nicely formatted in a web browser, run `npm run render`. It starts a server and opens the corresponding page in a web browser. \ No newline at end of file +3. Run the tests with `npm test`. This will also lint the files and verify it follows best practices. +4. To show the files nicely formatted in a web browser, run `npm run render`. It starts a server and opens the corresponding page in a web browser. + +## Development processes + +All new processes must be added to the `proposals` folder. Each process must be declared to be `experimental`. +Processes must comply to best practices, which ensure a certain degree of consistency. +`npm test` will validate and lint the processes and also ensure the best practices are applied. + +The linting checks that the files are named correctly, that the content is correctly formatted and indented (JSON and embedded CommonMark). +The best practices ensure that for examples the fields are not too short and also not too long for example. + +A spell check is also checking the texts. It may report names and rarely used technical words as errors. +If you are sure that these are correct, you can add them to the `.words` file to exclude the word from being reported as an error. +The file must contain one word per line. + +New processes should be added via GitHub Pull Requests. + +## Subtype schemas + +Sometimes it is useful to define a new "data type" on top of the JSON types (number, string, array, object, ...). +For example, a client could make a select box with all collections available by adding a subtype `collection-id` to the JSON type `string`. +If you think a new subype should be added, you need to add it to the `meta/subtype-schemas.json` file. +It must be a valid JSON Schema. The tests mentioned above will also verify to a certain degree that the subtypes are defined correctly. + +## Examples + +To get out of proposal state, at least two examples must be provided. +The examples are located in the `examples` folder and will also be validated to some extent in the tests. \ No newline at end of file diff --git a/tests/docs.html b/tests/docs.html index d7ba767c..782d115b 100644 --- a/tests/docs.html +++ b/tests/docs.html @@ -114,7 +114,7 @@ document: 'processes.json', categorize: true, apiVersion: '1.1.0', - title: 'openEO processes (1.1.0)', + title: 'openEO processes (1.2.0)', notice: '**Note:** This is the list of all processes specified by the openEO project. Back-ends implement a varying set of processes. Thus, the processes you can use at a specific back-end may derive from the specification, may include non-standardized processes and may not implement all processes listed here. Please check each back-end individually for the processes they support. The client libraries usually have a function called `listProcesses` or `list_processes` for that.' } }) diff --git a/tests/processes.test.js b/tests/processes.test.js index 089328f9..1d0d004c 100644 --- a/tests/processes.test.js +++ b/tests/processes.test.js @@ -1,7 +1,7 @@ const glob = require('glob'); const fs = require('fs'); const path = require('path'); -const { normalizeString, checkDescription, checkSpelling, checkJsonSchema, getAjv, prepareSchema } = require('./testHelpers'); +const { normalizeString, checkDescription, checkSpelling, checkJsonSchema, getAjv, prepareSchema, isObject } = require('./testHelpers'); const anyOfRequired = [ "quantiles", @@ -21,6 +21,7 @@ var loader = (file, proposal = false) => { // Prepare for tests processes.push([file, p, fileContent.toString(), proposal]); + processIds.push(p.id); } catch(err) { processes.push([file, {}, "", proposal]); console.error(err); @@ -29,6 +30,7 @@ var loader = (file, proposal = false) => { }; var processes = []; +var processIds = []; const files = glob.sync("../*.json", {realpath: true}); files.forEach(file => loader(file)); @@ -36,6 +38,11 @@ files.forEach(file => loader(file)); const proposals = glob.sync("../proposals/*.json", {realpath: true}); proposals.forEach(file => loader(file, true)); +test("Check for duplicate process ids", () => { + const duplicates = processIds.filter((id, index) => processIds.indexOf(id) !== index); + expect(duplicates).toEqual([]); +}); + describe.each(processes)("%s", (file, p, fileContent, proposal) => { test("File / JSON", () => { @@ -66,7 +73,7 @@ describe.each(processes)("%s", (file, p, fileContent, proposal) => { // description expect(typeof p.description).toBe('string'); // lint: Description should be longer than a summary - expect(p.description.length).toBeGreaterThan(55); + expect(p.description.length).toBeGreaterThan(60); checkDescription(p.description, p); }); @@ -98,7 +105,7 @@ describe.each(processes)("%s", (file, p, fileContent, proposal) => { } test("Return Value", () => { - expect(typeof p.returns).toBe('object'); + expect(isObject(p.returns)).toBeTruthy(); expect(p.returns).not.toBeNull(); // return value description @@ -108,14 +115,14 @@ describe.each(processes)("%s", (file, p, fileContent, proposal) => { checkDescription(p.returns.description, p); // return value schema - expect(typeof p.returns.schema).toBe('object'); expect(p.returns.schema).not.toBeNull(); + expect(typeof p.returns.schema).toBe('object'); // lint: Description should not be empty checkJsonSchema(jsv, p.returns.schema); }); test("Exceptions", () => { - expect(typeof p.exceptions === 'undefined' || (typeof p.exceptions === 'object' && p.exceptions !== 'null')).toBeTruthy(); + expect(typeof p.exceptions === 'undefined' || isObject(p.exceptions)).toBeTruthy(); }); var exceptions = o2a(p.exceptions); @@ -153,7 +160,7 @@ describe.each(processes)("%s", (file, p, fileContent, proposal) => { } var paramKeys = Object.keys(parametersObj); - expect(typeof example).toBe('object'); + expect(isObject(example)).toBeTruthy(); expect(example).not.toBeNull(); // example title @@ -194,8 +201,7 @@ describe.each(processes)("%s", (file, p, fileContent, proposal) => { if (Array.isArray(p.links)) { test.each(p.links)("Links > %#", (link) => { - expect(typeof link).toBe('object'); - expect(link).not.toBeNull(); + expect(isObject(link)).toBeTruthy(); // link href expect(typeof link.href).toBe('string'); @@ -250,8 +256,8 @@ function checkParam(param, p, checkCbParams = true) { checkFlags(param); // Parameter schema - expect(typeof param.schema).toBe('object'); expect(param.schema).not.toBeNull(); + expect(typeof param.schema).toBe('object'); checkJsonSchema(jsv, param.schema); if (!checkCbParams) { diff --git a/tests/subtype-schemas.test.js b/tests/subtype-schemas.test.js deleted file mode 100644 index 49633fda..00000000 --- a/tests/subtype-schemas.test.js +++ /dev/null @@ -1,22 +0,0 @@ -const fs = require('fs'); -const $RefParser = require("@apidevtools/json-schema-ref-parser"); -const { checkJsonSchema, normalizeString, getAjv } = require('./testHelpers'); - -test("subtype-schemas.json", async () => { - let fileContent = fs.readFileSync('../meta/subtype-schemas.json'); - - let schema = JSON.parse(fileContent); - expect(schema).not.toBe(null); - expect(typeof schema).toBe('object'); - - // lint: Check whether the file is correctly JSON formatted - expect(normalizeString(JSON.stringify(schema, null, 4))).toEqual(normalizeString(fileContent.toString())); - - // Is JSON Schema valid? - checkJsonSchema(await getAjv(), schema); - - // is everything dereferencable? - let subtypes = await $RefParser.dereference(schema, { dereference: { circular: "ignore" } }); - expect(subtypes).not.toBe(null); - expect(typeof subtypes).toBe('object'); -}); \ No newline at end of file diff --git a/tests/subtypes-file.test.js b/tests/subtypes-file.test.js new file mode 100644 index 00000000..e70f7e8f --- /dev/null +++ b/tests/subtypes-file.test.js @@ -0,0 +1,29 @@ +const fs = require('fs'); +const $RefParser = require("@apidevtools/json-schema-ref-parser"); +const { checkJsonSchema, getAjv, isObject, normalizeString } = require('./testHelpers'); + +test("File subtype-schemas.json", async () => { + let schema; + let fileContent; + try { + fileContent = fs.readFileSync('../meta/subtype-schemas.json'); + schema = JSON.parse(fileContent); + } catch(err) { + console.error("The file for subtypes is invalid and can't be read:"); + console.error(err); + expect(err).toBeUndefined(); + } + + expect(isObject(schema)).toBeTruthy(); + expect(isObject(schema.definitions)).toBeTruthy(); + + // lint: Check whether the file is correctly JSON formatted + expect(normalizeString(JSON.stringify(schema, null, 4))).toEqual(normalizeString(fileContent.toString())); + + // Is JSON Schema valid? + checkJsonSchema(await getAjv(), schema); + + // is everything dereferencable? + let subtypes = await $RefParser.dereference(schema, { dereference: { circular: "ignore" } }); + expect(isObject(subtypes)).toBeTruthy(); +}); \ No newline at end of file diff --git a/tests/subtypes-schemas.test.js b/tests/subtypes-schemas.test.js new file mode 100644 index 00000000..ff1b72bd --- /dev/null +++ b/tests/subtypes-schemas.test.js @@ -0,0 +1,54 @@ +const $RefParser = require("@apidevtools/json-schema-ref-parser"); +const { checkDescription, checkSpelling, isObject } = require('./testHelpers'); + +// I'd like to run the tests for each subtype individually instead of in a loop, +// but jest doesn't support that, so you need to figure out yourself what is broken. +// The console.log in afterAll ensures we have a hint of which process was checked last + +// Load and dereference schemas +let subtypes = {}; +let lastTest = null; +let testsCompleted = 0; +beforeAll(async () => { + subtypes = await $RefParser.dereference('../meta/subtype-schemas.json', { dereference: { circular: "ignore" } }); + return subtypes; +}); + +afterAll(async () => { + if (testsCompleted != Object.keys(subtypes.definitions).length) { + console.log('The schema the test has likely failed for: ' + lastTest); + } +}); + +test("Schemas in subtype-schemas.json", () => { + // Each schema must contain at least a type, subtype, title and description + for(let name in subtypes.definitions) { + let schema = subtypes.definitions[name]; + lastTest = name; + + // Schema is object + expect(isObject(schema)).toBeTruthy(); + + // Type is array with an element or a stirng + expect((Array.isArray(schema.type) && schema.type.length > 0) || typeof schema.type === 'string').toBeTruthy(); + + // Subtype is a string + expect(typeof schema.subtype === 'string').toBeTruthy(); + + // Check title + expect(typeof schema.title === 'string').toBeTruthy(); + // lint: Summary should be short + expect(schema.title.length).toBeLessThan(60); + // lint: Summary should not end with a dot + expect(/[^\.]$/.test(schema.title)).toBeTruthy(); + checkSpelling(schema.title, schema); + + // Check description + expect(typeof schema.description).toBe('string'); + // lint: Description should be longer than a summary + expect(schema.description.length).toBeGreaterThan(60); + checkDescription(schema.description, schema); + + testsCompleted++; + } +}); \ No newline at end of file diff --git a/tests/testHelpers.js b/tests/testHelpers.js index 385d0449..3f998088 100644 --- a/tests/testHelpers.js +++ b/tests/testHelpers.js @@ -116,6 +116,10 @@ async function getAjv() { return jsv; } +function isObject(obj) { + return (typeof obj === 'object' && obj === Object(obj) && !Array.isArray(obj)); +} + function normalizeString(str) { return str.replace(/\r\n|\r|\n/g, "\n").trim(); } @@ -214,5 +218,6 @@ module.exports = { checkSpelling, checkJsonSchema, checkSchemaRecursive, - prepareSchema + prepareSchema, + isObject }; \ No newline at end of file diff --git a/variance.json b/variance.json index 8d9acbeb..78f76feb 100644 --- a/variance.json +++ b/variance.json @@ -3,7 +3,7 @@ "summary": "Variance", "description": "Computes the sample variance of an array of numbers by calculating the square of the standard deviation (see ``sd()``). It is defined to be the expectation of the squared deviation of a random variable from its expected value. Basically, it measures how far the numbers in the array are spread out from their average value.\n\nAn array without non-`null` elements resolves always with `null`.", "categories": [ - "math", + "math > statistics", "reducer" ], "parameters": [