Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic aggregate process #526

Open
wants to merge 1 commit into
base: draft
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- `aggregate`
- `export_collection`
- `export_workspace`
- `stac_modify`
Expand Down
135 changes: 135 additions & 0 deletions proposals/aggregate.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
{
"id": "aggregate",
"summary": "Aggregation based on general intervals",
"description": "Computes a aggregation based on an array of intervals.\n\nThe computed values will be projected to the labels. If no labels are specified, the lower value of the interval will be used as label for the corresponding values. In case of a conflict (i.e. the user-specified values for the lower values of the intervals are not distinct), the user-defined labels must be specified in the parameter `labels` as otherwise a `DistinctDimensionLabelsRequired` exception would be thrown. The number of user-defined labels and the number of intervals need to be equal.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computed values will be projected to the labels

what does "projecting" to a label mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mapped" is probably better than "projected". Maybe it can also just be removed.

(I should also check whether this wording also exists in aggregate_temporal...)

"categories": [
"cubes",
"aggregate"
],
"experimental": true,
"parameters": [
{
"name": "data",
"description": "A data cube.",
"schema": {
"type": "object",
"subtype": "datacube"
}
},
{
"name": "intervals",
"description": "Left-closed intervals, which are allowed to overlap. Each interval in the array has exactly two elements:\n\n1. The first element is the lower value of the interval. The specified value is **included** in the interval.\n2. The second element is the upper value of the temporal interval. The specified value is **excluded** from the interval.\n\nThe second element must always be greater than the first element. Otherwise, an `ExtentEmpty` exception is thrown.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when working with a spatial dimension (e.g. "x" or "y"): how is a users supposed to know what CRS to use to define the intervals? I don't think this is explicitly available or defined on a datacube.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second element is the upper value of the temporal interval.

I guess that "temporal" is not intentional there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it? The datacube extension defines the CRS and then that defines the unit and extents etc.

Suggested change
"description": "Left-closed intervals, which are allowed to overlap. Each interval in the array has exactly two elements:\n\n1. The first element is the lower value of the interval. The specified value is **included** in the interval.\n2. The second element is the upper value of the temporal interval. The specified value is **excluded** from the interval.\n\nThe second element must always be greater than the first element. Otherwise, an `ExtentEmpty` exception is thrown.",
"description": "Left-closed intervals, which are allowed to overlap. Each interval in the array has exactly two elements:\n\n1. The first element is the lower value of the interval. The specified value is **included** in the interval.\n2. The second element is the upper value of the interval. The specified value is **excluded** from the interval.\n\nThe second element must always be greater than the first element. Otherwise, an `ExtentEmpty` exception is thrown.",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datacube extension defines the CRS and then that defines the unit and extents etc.

Yes, but nothing that is relevant to figuring out the actual labels (reference_system, step, values and what not) is required (in datacube extension, nor openEO API).

Also, this might be backend-dependend, so that would undermine the reproducibility of the process graph

"schema": {
"type": "array",
"minItems": 1,
"items": {
"type": "array",
"uniqueItems": true,
"minItems": 2,
"maxItems": 2,
"items": {
"type": "number"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So intervals are only based on numbers? That implies that this process only works along a spatial dimension in practice?
Other dimensions don't have numeric labels: band dimensions have strings and temporal dimensions have date/datetime (subtype of string)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of string values are not really well-defined. As such, yes only numerical. Dimensions can have numerical values though, we have a couple of processes that just index the labels if no further information is given (e.g. apply_dimension).

}
}
}
},
{
"name": "reducer",
"description": "A reducer to be applied for the values contained in each interval. A reducer is a single process such as ``mean()`` or a set of processes, which computes a single value for a list of values, see the category 'reducer' for such processes. Intervals may not contain any values, which for most reducers leads to no-data (`null`) values by default.",
"schema": {
"type": "object",
"subtype": "process-graph",
"parameters": [
{
"name": "data",
"description": "A labeled array with elements of any type. If there's no data for the interval, the array is empty.",
"schema": {
"type": "array",
"subtype": "labeled-array",
"items": {
"description": "Any data type."
}
}
},
{
"name": "context",
"description": "Additional data passed by the user.",
"schema": {
"description": "Any data type."
},
"optional": true,
"default": null
}
],
"returns": {
"description": "The value to be set in the new data cube.",
"schema": {
"description": "Any data type."
}
}
}
},
{
"name": "dimension",
"description": "The name of the dimension for aggregation. All data along the dimension is passed through the specified reducer. Fails with a `DimensionNotAvailable` exception if the specified dimension does not exist.",
"schema": {
"type": [
"string",
"null"
]
}
},
{
"name": "labels",
"description": "Distinct labels for the intervals. Is only required to be specified if the values for the lower values of the intervals are not distinct and thus the default labels would not be unique. The number of labels and the number of groups must be equal.",
"schema": {
"type": "array",
"uniqueItems": true,
"items": {
"type": "number"
}
},
"default": [],
"optional": true
},
{
"name": "context",
"description": "Additional data to be passed to the reducer.",
"schema": {
"description": "Any data type."
},
"optional": true,
"default": null
}
],
"returns": {
"description": "A new data cube with the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged, except for the resolution and dimension labels of the given temporal dimension.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of the given temporal dimension

is it intentional to have "temporal" there, or should this be generic for all types of dimensions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "A new data cube with the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged, except for the resolution and dimension labels of the given temporal dimension.",
"description": "A new data cube with the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged, except for the resolution and dimension labels of the given dimension.",

No, just copy-pasted things poorly from aggregate_temporal ;-)

"schema": {
"type": "object",
"subtype": "datacube",
"dimensions": [
{
"type": "temporal"
}
]
}
},
"exceptions": {
"DimensionNotAvailable": {
"message": "A dimension with the specified name does not exist."
},
"DistinctDimensionLabelsRequired": {
"message": "The dimension labels have duplicate values. Distinct labels must be specified."
},
"ExtentEmpty": {
"message": "At least one of the intervals is empty. The second instant in time must always be greater than the first instant."
}
},
"links": [
{
"href": "https://openeo.org/documentation/1.0/datacubes.html#aggregate",
"rel": "about",
"title": "Aggregation explained in the openEO documentation"
}
]
}
Loading