Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support 'any' process in filter_labels #239

Open
jdries opened this issue Nov 20, 2023 · 7 comments
Open

support 'any' process in filter_labels #239

jdries opened this issue Nov 20, 2023 · 7 comments
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Nov 20, 2023

In filter labels, when chaining together many (100+) 'or' processes, we run into maximum recursion depth:

  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_collections.py", line 82, in __setitem__
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1314, in __call__
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1277, in _build_args
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1264, in _get_args
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_collections.py", line 523, in convert
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_collections.py", line 82, in __setitem__
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1314, in __call__
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1277, in _build_args
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1263, in _get_args
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_collections.py", line 490, in can_convert
  File "/usr/lib64/python3.8/abc.py", line 98, in __instancecheck__
    return _abc_instancecheck(cls, instance)
RecursionError: maximum recursion depth exceeded in comparison

The best solution is to use the 'any' process instead, which should then receive an array of 'date_between' comparisons.

We will need to extend our new type checking mechanism in OpenEOProcessScriptBuilder to allow detecting this case.

@jdries
Copy link
Contributor Author

jdries commented Nov 26, 2023

@EmileSonneveld I committed a first version of this, to get something to the user that needs this.
Would be good if you could:

  • follow up on deployment
  • write an example to test, and provide to user
  • review commit, perhaps enhance test and code itself, because it's a very quick version. Especially the type checking mechanism probably needs more work.

@EmileSonneveld
Copy link
Contributor

This code worked:

def build_condition(x):
    conditions = []
    dates = ["2021-01-02", "2021-01-05", "2021-02-01", "2021-02-04"]
    for date in dates:
        min_date = (datetime.datetime.fromisoformat(date)).isoformat() + "Z"
        max_date = (datetime.datetime.fromisoformat(date) + datetime.timedelta(days=1)).isoformat() + "Z"
        conditions.append(process("date_between", x=x, min=min_date, max=max_date))
    return any(conditions)


condition = build_child_callback(build_condition, parent_parameters=["value"])

datacube = datacube.process(process_id="filter_labels",
                            arguments={"data": datacube, "condition": condition},
                            dimension="t")

But nicer syntax will be available in next release:

from openeo.processes import any, process, date_between
from openeo.util import rfc3339

def filter_labels_condition(x):
    conditions = []
    dates = ["2021-01-02", "2021-01-05", "2021-02-01", "2021-02-04"]
    for date in dates:
        date = rfc3339.parse_date(date)
        min_date = rfc3339.date(date)
        max_date = rfc3339.date(date + datetime.timedelta(days=1))
        conditions.append(date_between(x=x, min=min_date, max=max_date))
    return any(conditions)

datacube = datacube.filter_labels(condition=filter_labels_condition, dimension="t")

date_between needs to be supported in the editor

@EmileSonneveld
Copy link
Contributor

date_between now also available in web editor

@EmileSonneveld
Copy link
Contributor

any can be used in 2 ways. To reduce values in apply_dimension, or to use as to reduce a bunch of single values. The following example uses the element in both ways, but gives an error:

{
  "process_graph": {
    "loadcollection1": {
      "process_id": "load_collection",
      "arguments": {
        "bands": [
          "B04"
        ],
        "id": "SENTINEL2_L2A",
        "spatial_extent": {
          "east": 5.08,
          "north": 51.22,
          "south": 51.215,
          "west": 5.07
        },
        "temporal_extent": [
          "2021-01-01",
          "2021-03-01"
        ]
      }
    },
    "apply1": {
      "process_id": "apply_dimension",
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "dimension": "t",
        "process": {
          "process_graph": {
            "any1": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                }
              },
              "process_id": "any"
            },
            "any2": {
              "arguments": {
                "data": [
                  {
                    "from_node": "constant1"
                  },
                  {
                    "from_node": "constant2"
                  }
                ]
              },
              "process_id": "any"
            },
            "constant1": {
              "arguments": {
                "x": false
              },
              "process_id": "constant"
            },
            "constant2": {
              "arguments": {
                "x": true
              },
              "process_id": "constant"
            },
            "if5": {
              "arguments": {
                "accept": {
                  "from_node": "any1"
                },
                "value": {
                  "from_node": "any2"
                }
              },
              "process_id": "if",
              "result": true
            }
          }
        }
      }
    },
    "saveresult1": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "apply1"
        },
        "format": "GTiff"
      },
      "result": true
    }
  },
  "parameters": []
}

@EmileSonneveld
Copy link
Contributor

With a more fine tuned process graph, the any node actually does work in 2 different use cases

process graph
{
  "process_graph": {
    "loadcollection1": {
      "process_id": "load_collection",
      "arguments": {
        "bands": [
          "B04"
        ],
        "id": "SENTINEL2_L2A",
        "spatial_extent": {
          "east": 5.08,
          "north": 51.22,
          "south": 51.215,
          "west": 5.07
        },
        "temporal_extent": [
          "2021-01-09",
          "2021-01-13"
        ]
      }
    },
    "apply1": {
      "process_id": "apply_dimension",
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "dimension": "t",
        "process": {
          "process_graph": {
            "any2": {
              "process_id": "any",
              "arguments": {
                "data": [
                  1,
                  0,
                  1
                ]
              }
            },
            "any1": {
              "process_id": "any",
              "arguments": {
                "data": [
                  {
                    "from_node": "gt1"
                  }
                ]
              }
            },
            "if5": {
              "process_id": "if",
              "arguments": {
                "accept": {
                  "from_node": "any1"
                },
                "value": {
                  "from_node": "any2"
                }
              }
            },
            "multiply1": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "if5"
                },
                "y": 1
              },
              "result": true
            },
            "gt1": {
              "process_id": "gt",
              "arguments": {
                "y": 700,
                "x": {
                  "from_parameter": "data"
                }
              }
            }
          }
        }
      },
      "result": true
    }
  },
  "parameters": []
}

@EmileSonneveld
Copy link
Contributor

any works fine with floats and booleans for tiles, but fails on booleans in 'constant mode': Method constantArrayElement([class java.lang.Boolean]) does not exist. I'll check for a quick fix
(In the any node, ignore_nodata seems ignored)

Some code changed around the eq node. It also gives an error when using it with booleans in 'constant mode': java.lang.ClassCastException: class java.lang.Boolean cannot be cast to class scala.collection.Seq
Floats work in 'constant mode'. Floats and booleans work in tile mode

process graph
{
  "process_graph": {
    "loadcollection1": {
      "process_id": "load_collection",
      "arguments": {
        "bands": [
          "SCL"
        ],
        "id": "SENTINEL2_L2A",
        "spatial_extent": {
          "east": -25,
          "north": 41,
          "south": 39.5,
          "west": -26.5
        },
        "temporal_extent": [
          "2021-01-01",
          "2021-01-10"
        ]
      }
    },
    "apply1": {
      "process_id": "apply",
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "process": {
          "process_graph": {
            "cos1": {
              "process_id": "cos",
              "arguments": {
                "x": {
                  "from_parameter": "x"
                }
              }
            },
            "eq1": {
              "process_id": "eq",
              "arguments": {
                "x": {
                  "from_parameter": "x"
                },
                "y": 9,
                "delta": 0
              }
            },
            "eq2": {
              "process_id": "eq",
              "arguments": {
                "x": 1,
                "y": 1
              }
            },
            "if1": {
              "process_id": "if",
              "arguments": {
                "accept": {
                  "from_node": "eq1"
                },
                "value": {
                  "from_node": "eq2"
                },
                "reject": {
                  "from_node": "cos1"
                }
              }
            },
            "multiply1": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "if1"
                },
                "y": 1
              },
              "result": true
            }
          }
        }
      }
    },
    "saveresult1": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "apply1"
        },
        "format": "netcdf"
      },
      "result": true
    }
  },
  "parameters": []
}

@EmileSonneveld
Copy link
Contributor

Maybe related: #286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants