Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antimeridian-crossing Items and Searching For Them #296

Closed
alexgleith opened this issue Nov 24, 2023 · 14 comments
Closed

Antimeridian-crossing Items and Searching For Them #296

alexgleith opened this issue Nov 24, 2023 · 14 comments

Comments

@alexgleith
Copy link

alexgleith commented Nov 24, 2023

I have a complicated antimeridian issue. I'll try to outline it below.

Say I have a query like this:

datetime = "2022-09/2022-10"
collections = ["landsat-c2-l2"]
bbox = [179.1, -17.7, 179.95, -17.0]

Note that this bounding box is close to the 180 antimeridian, but doesn't cross it.

This query returns 6 items, but, it should return 12, as there are items that cover the right hand side of the box, but their geometry doesn't overlap, because it looks more like this:

{
  'type': 'Polygon',
   'coordinates': [[
     [-178.5539822, -16.6580789],
     [-178.9381784, -18.3911529],
     [-180, -18.1699399],
     [-180.6707455, -18.0302011],
     [-180.270388, -16.3017406],
     [-180, -16.3578751],
     [-178.5539822, -16.6580789]]
  ]
}

Note that that item is this one.

Also note that that item has a bounding box of: [-180.67074549, -18.39115293, -178.55398222, -16.30174058].

Now, I thought I might be clever and put a little hack in when I'm searching close to the 180 degree antimeridian. But! If I do a query with values of less than -180 then I get an exception back from the STAC API:

APIError: 1 validation error for Request
body -> bbox
  Bounding box must be within (-180, -90, 180, 90) (type=value_error)

So, there are STAC documents that have values in their geometry that I can't actually query against. Clearly this is a problem!

Any suggestions on how to workaround this?

@alexgleith
Copy link
Author

Note that one possible workaround is using pathrows for querying, but then I need a lookuptable for geometry -> pathrow.

Pathrow query looks like this:

PR_ID_RIGHT = "073072"
PR_ID_LEFT = "074072"

query_left = {
    "landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
    "landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
}
query_right = {
    "landsat:wrs_row": {"eq": PR_ID_RIGHT[3:]},
    "landsat:wrs_path": {"eq": PR_ID_RIGHT[:3]}
}

items_left_pr = list(
    client.search(
        collections=["landsat-c2-l2"],
        query=query_left,
        datetime=datetime
    ).items()
)
items_right_pr = list(
    client.search(
        collections=["landsat-c2-l2"],
        query=query_right,
        datetime=datetime
    ).items()
)
print(len(items_left_pr), len(items_right_pr))

Results in 6, 6 so 12 scenes returned, which is what we want!

@alexgleith
Copy link
Author

Another note that the USGS splits their geometry at the antimeridian, so that the queries work as expected.

Evidence in this very wide image!

image

Curiously, they return 16 scenes, which is more than the 12 I could get on the MSPC.

usgs_client = Client.open("https://landsatlook.usgs.gov/stac-server/")
usgs_items = list(usgs_client.search(collections=["landsat-c2l2-sr"], datetime=datetime, bbox=bbox).items())

print(len(usgs_items))

16

@alexgleith
Copy link
Author

The triangle of missing data in the below image is caused by the inability retrieve these items from the STAC API.

image

@TomAugspurger
Copy link

I'll have to check on which verison of stactools-packages/landsat we're on. It's possible we're on a version from before stactools-packages/landsat#30 and friends.

@jessjaco
Copy link

Unfortunately just searching for the desired pathrows via metadata (landsat:wrs_path / landsat:wrs_row) without any geometry in the query usually times out.

Absent fixes to the underlying stac items, one workaround for this (at least for landsat) is to:

  1. Determine the landsat pathrows which intersect the (antimeridian crossing) region of interest. This is an exercise in and of itself!
  2. Split the bounding box of those pathrows on the antimeridian.
  3. Search on the split bboxes separately, and combine the results.

This should also accommodate the bad geometry for some stac items, which is in another issue I can't find right now. It also might be slightly liberal in some areas, based on the actual bounding boxes, so it might make sense to include the path/row filtering if you want strict results.

@alexgleith
Copy link
Author

Unfortunately just searching for the desired pathrows via metadata (landsat:wrs_path / landsat:wrs_row) without any geometry in the query usually times out.

You're right. Querying for a year times out after 30 seconds... so this isn't a viable option either!

from pystac_client import Client

catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"
client = Client.open(catalog)

PR_ID_LEFT = "074072"

query_left = {
    "landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
    "landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
}

items_left_pr = list(
    client.search(
        collections=["landsat-c2-l2"],
        query=query_left,
        datetime="2023"
    ).items()
)

print(f"Found {len(items_left_pr)} items for path/row {PR_ID_LEFT}")

APIError: The request exceeded the maximum allowed time, please try again. If the issue persists, please contact [email protected].

@TomAugspurger
Copy link

Querying for a year times out after 30 seconds

Your best bet for now might be to break the query down into smaller time intervals:

import pandas as pd
from pystac_client import Client

catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"
client = Client.open(catalog)

PR_ID_LEFT = "074072"

query_left = {
    "landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
    "landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
}

starts = pd.date_range("2023", periods=12, freq="MS", tz="UTC")


items = []
for start, stop in zip(starts, starts[1:]):
    print(start)
    items_left_pr = list(
        client.search(
            collections=["landsat-c2-l2"],
            query=query_left,
            bbox=[-180, -90, 180, 90],
            datetime=[start, stop]
        ).items()
    )
    items.extend(items_left_pr)

That finished for me in ~55 seconds.


@mmcfarland can you remind me: if the property_index_type field in pgstac.queryables is null, does that mean there isn't an index on that property?

"id"	"name"	"collection_ids"	"definition"	"property_path"	"property_wrapper"	"property_index_type"
97	"landsat:wrs_path"	"{landsat-8-c2-l2,landsat-c2-l1,landsat-c2-l2}"	"{""type"": ""string"", ""title"": ""WRS Path""}"			
98	"landsat:wrs_row"	"{landsat-8-c2-l2,landsat-c2-l1,landsat-c2-l2}"	"{""type"": ""string"", ""title"": ""WRS Row""}"			
99	"landsat:wrs_type"	"{landsat-8-c2-l2,landsat-c2-l1,landsat-c2-l2}"	"{""type"": ""string""}"			

Lack of an index on those fields means that querying on them isn't going to be too effective, the database will still have to scan everything before filtering.

@alexgleith
Copy link
Author

Your best bet for now might be to break the query down into smaller time intervals:

Understood. We're trying to build a framework to do this kind of thing across the Pacific to develop a lot of products, so these kinds of workarounds are really not ideal. And if we're falling back to pathrow queries, then further breaking them down into small datetime ranges is an extra compromise. And it doesn't feel right, since I can currently query for 40 years of Landsat and get thousands of items returned without issue using a geometry query.

If they can have an index put on them, that would be great.

@TomAugspurger
Copy link

Matt confirmed that those fields are currently unindexed. We'll look into adding them.

So, there are STAC documents that have values in their geometry that I can't actually query against. Clearly this is a problem!

Getting back to this, which I missed the first time: I think the item's geometry is incorrect:

    "bbox": [
        -180.67074549,
        -18.39115293,
        -178.55398222,
        -16.30174058
  ]

We'll need to check on the expected value there, but it probably shouldn't be less that -180? I'm not sure. stactools-packages/landsat#66 looks perhaps similar, but not identitcal.

@alexgleith
Copy link
Author

We'll need to check on the expected value there

Did this end up getting fixed @TomAugspurger? We have a workaround, so I think we're ok, but it would be nice if it was resolved.

@TomAugspurger
Copy link

TomAugspurger commented Apr 5, 2024 via email

@TomAugspurger
Copy link

We got the index added in landsat:wrs_path and landsat:wrs_row added (finally). The example which timed out earlier is quick now:

In [25]: from pystac_client import Client
    ...:
    ...: catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"
    ...: client = Client.open(catalog)
    ...:
    ...: PR_ID_LEFT = "074072"
    ...:
    ...: query_left = {
    ...:     "landsat:wrs_row": {"eq": PR_ID_LEFT[3:]},
    ...:     "landsat:wrs_path": {"eq": PR_ID_LEFT[:3]}
    ...: }
    ...:
    ...: items_left_pr = list(
    ...:     client.search(
    ...:         collections=["landsat-c2-l2"],
    ...:         query=query_left,
    ...:         datetime="2023"
    ...:     ).items()
    ...: )
    ...:
    ...: %time print(f"Found {len(items_left_pr)} items for path/row {PR_ID_LEFT}")
Found 45 items for path/row 074072
CPU times: user 72 us, sys: 3 us, total: 75 us
Wall time: 79.4 us

@alexgleith
Copy link
Author

Thanks, @TomAugspurger, this is really nice!

I think this should stay open though, as the original issue is with some of the geometries/bounding boxes.

(Great to meet you the other day!)

@ghidalgo3
Copy link

Closed due to inactivity, feel free to reopen if you would like to continue this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants