-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve from_parameter correctly #24
Comments
Resolving the node max1_3 and setting the data to reference node 2_1 is completely wrong behavior. You are expecting an array of values for a certain dimension (t), but you get the whole data cube. That's like I request 5€ from you and you just give me access to your whole bank account for me to grab 5€ from it. So you should only pass the corresponding values, but not everything. The implementation of reduce_dimension needs to get the corresponding values from the data cube and pass them to the sub process graph so that they can be made available as parameter data. So node1_3 must still be:
after parsing. |
Yeah, I get your point from a formal side. This implementation was a bit biased how you would implement such processes in Python, but shouldn't be, of course. I will change this behaviour in a new branch, |
Actually I like how the parser translates the graph, even if formally it has a wrong behavior. With the current implementation in was fairly easy for me to deal with the graph in my opendatacube based back-end.But the merge_cubes parsing has to be addressed somehow. |
It should be solved correctly, otherwise you need to find workarounds for every bit that is different. And there are some, you'll run into issues eventually. Also, you can still do it similarly, but that should be driven by process implementations, not by the parser. Only in process implementation you can act according to the definitions.
I don't say you shouldn't pass by reference in general, I'm just saying that the parser shouldn't mandate any behavior based on (wrong) assumptions. That will eventually fail. As said above, it should be driven by process implementations. If it's better to pass a whole datacube in a reduce process, but not in merge_cubes, that should be made by the process itself. We do the same in GEE and it works quite well (except that merge_cubes has a poor implementation). But that's not a PG parser issue but a process implementation issue. Also, I see the Python and JS parsers as reference implementation for openEO processes. If we want people to adopt it, then it should behave according to the spec, not on any assumptions. |
…ency structure of nodes by introducing "callback", "process" and "data" edges.; implemented new topological sorting algorithm
@m-mohr @clausmichele @lforesta : According to our discussion a few months ago, I finally implemented the correct (at least I hope so) behaviour. For the example above this would yield:
Please test if everything works for you (see #29) and close this issue if that's the case, thanks! |
The parser currently resolves all
from_parameter
tags with either the default value of the parent process or the input node ID of the parent process. This makes it easy for back-ends to determine the input and output parameters of a process node, without traversing the graph. However, this caused some discrepancies for certain processes mentioned in the issues #23 and Open-EO/openeo-processes#184.To summarise, the parser traverses the whole graph and sets up the corresponding node dependencies. These dependencies are either set with 'from_node' (referencing to a node on the same level) or 'from_parameter' (referencing to data traced through a callback). An example would be the following process graph:
Being translated into the following representation:
As you can see, 'from_node' is replaced with the node ID of the parent process. For 'from_parameter' the same is done, namely
'data': {'from_parameter': 'data'}
is replaced with{'data': {'from_node': '2_1'}}
. This was implemented in this way, because the parent process (in this casereduce_dimension
) anyway simply passes its input data to the callback function. This makes it easier to directly evaluate the callback processmax
.However, for such processes as
merge_cubes
the relationship becomes more complicated and the back-end needs to decide what to do with the input data. What is your opinion how the parser should handle this in general? I think the feature above is nice to have, but it causes some inconsistencies for binary processes on the other side.The text was updated successfully, but these errors were encountered: