Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic aggregate process #526

Open
wants to merge 1 commit into
base: draft
Choose a base branch
from
Open

Add generic aggregate process #526

wants to merge 1 commit into from

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Dec 11, 2024

Based on aggregate_temporal.

Origin of the discussion: https://gitlab.ogc.org/ogc/T20-GDC/-/issues/32#note_34824 (likely access restricted)

Rendered version:

aggregate

Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick notes

{
"id": "aggregate",
"summary": "Aggregation based on general intervals",
"description": "Computes a aggregation based on an array of intervals.\n\nThe computed values will be projected to the labels. If no labels are specified, the lower value of the interval will be used as label for the corresponding values. In case of a conflict (i.e. the user-specified values for the lower values of the intervals are not distinct), the user-defined labels must be specified in the parameter `labels` as otherwise a `DistinctDimensionLabelsRequired` exception would be thrown. The number of user-defined labels and the number of intervals need to be equal.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computed values will be projected to the labels

what does "projecting" to a label mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mapped" is probably better than "projected". Maybe it can also just be removed.

(I should also check whether this wording also exists in aggregate_temporal...)

},
{
"name": "intervals",
"description": "Left-closed intervals, which are allowed to overlap. Each interval in the array has exactly two elements:\n\n1. The first element is the lower value of the interval. The specified value is **included** in the interval.\n2. The second element is the upper value of the temporal interval. The specified value is **excluded** from the interval.\n\nThe second element must always be greater than the first element. Otherwise, an `ExtentEmpty` exception is thrown.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when working with a spatial dimension (e.g. "x" or "y"): how is a users supposed to know what CRS to use to define the intervals? I don't think this is explicitly available or defined on a datacube.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second element is the upper value of the temporal interval.

I guess that "temporal" is not intentional there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it? The datacube extension defines the CRS and then that defines the unit and extents etc.

Suggested change
"description": "Left-closed intervals, which are allowed to overlap. Each interval in the array has exactly two elements:\n\n1. The first element is the lower value of the interval. The specified value is **included** in the interval.\n2. The second element is the upper value of the temporal interval. The specified value is **excluded** from the interval.\n\nThe second element must always be greater than the first element. Otherwise, an `ExtentEmpty` exception is thrown.",
"description": "Left-closed intervals, which are allowed to overlap. Each interval in the array has exactly two elements:\n\n1. The first element is the lower value of the interval. The specified value is **included** in the interval.\n2. The second element is the upper value of the interval. The specified value is **excluded** from the interval.\n\nThe second element must always be greater than the first element. Otherwise, an `ExtentEmpty` exception is thrown.",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datacube extension defines the CRS and then that defines the unit and extents etc.

Yes, but nothing that is relevant to figuring out the actual labels (reference_system, step, values and what not) is required (in datacube extension, nor openEO API).

Also, this might be backend-dependend, so that would undermine the reproducibility of the process graph

"minItems": 2,
"maxItems": 2,
"items": {
"type": "number"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So intervals are only based on numbers? That implies that this process only works along a spatial dimension in practice?
Other dimensions don't have numeric labels: band dimensions have strings and temporal dimensions have date/datetime (subtype of string)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of string values are not really well-defined. As such, yes only numerical. Dimensions can have numerical values though, we have a couple of processes that just index the labels if no further information is given (e.g. apply_dimension).

}
],
"returns": {
"description": "A new data cube with the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged, except for the resolution and dimension labels of the given temporal dimension.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of the given temporal dimension

is it intentional to have "temporal" there, or should this be generic for all types of dimensions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "A new data cube with the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged, except for the resolution and dimension labels of the given temporal dimension.",
"description": "A new data cube with the same dimensions. The dimension properties (name, type, labels, reference system and resolution) remain unchanged, except for the resolution and dimension labels of the given dimension.",

No, just copy-pasted things poorly from aggregate_temporal ;-)

@soxofaan
Copy link
Member

As far as currently understand, this process would only work on spatial dimensions.
But then I wonder how practical this is: You can only aggregate at one dimension at a time. So you would first aggregate along "x" dimension, and then along "y". Which at best is a bit weird. And at worst (depending on the used reducer), it is mathematically wrong or error prone.

@soxofaan
Copy link
Member

https://gitlab.ogc.org/ogc/T20-GDC/-/issues/32#note_34824 (likely access restricted)

Indeed I don't have access

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants