Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add STAC creation and Xarray loading functionality #266

Open
wants to merge 55 commits into
base: develop
Choose a base branch
from

Conversation

forrestfwilliams
Copy link
Contributor

@forrestfwilliams forrestfwilliams commented Mar 1, 2024

Adapted from and heavily inspired by @scottyhq's 2023 AGU presentation, this PR adds the ability to create STAC items and collections from sets of completed HyP3 jobs. While we do provide unzipped copies of HyP3 products, we have never publicized this well because there has not been an efficient method to retrieve them. The STAC ecosystem provides an elegant solution to this problem that the community is already familiar with.

In addition, it provides utilities for turning these STAC collection into Xarray datastacks using odc-stac, and for turning these Xarray objects into MintPy-compatible hdf5 files. Using this new functionality to prep HyP3 data for MintPy is a significant improvement over our current preparation guidelines, since it removes the need to download, unzip, and crop each product individually.

To demo the new MintPy workflow enabled by these changes, check out this modified version of our HyP3 MintPy notebook.

@forrestfwilliams forrestfwilliams requested a review from a team as a code owner March 1, 2024 13:56
@forrestfwilliams forrestfwilliams marked this pull request as draft March 1, 2024 14:00
@forrestfwilliams forrestfwilliams marked this pull request as ready for review March 7, 2024 15:59
@scottyhq
Copy link

scottyhq commented Mar 7, 2024

Thanks for this @forrestfwilliams! We were able to run the demo notebook without any trouble and the new functions greatly facilitate the hyp3->mintpy connection.

Also wanted to link to this issue with some of the original discussion and links to prototypes ASFHyP3/hyp3-isce2#170.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest aligning with this extension as much as possible - https://github.com/stac-extensions/insar. Some of these extensions are not sar-specifc (sat:orbit_state for example), but are used out in the wild by many commercial data providers (planet, maxar, capella, umbra, etc), and the more standardization around common names the better from a user perspective! Some specific recommendations below:

  • sar:looks_range, sar:looks_azimuth, sar:observation_direction,
  • sat:orbit_state, sat:relative_orbit, (e.g. instead of reference_orbit_direction, secondary_orbit_number)
  • view:azimuth, view:incidence_angle
  • processing:lineage, processing:software

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @scottyhq, I'll look at incorporating these fields!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put up a test catalog here based on burst products since rendering with the canonical stacbrowser provides a nice test (https://radiantearth.github.io/stac-browser/#/external/raw.githubusercontent.com/relativeorbit/three-sisters/main/115_245676_IW2/stac/collection.json?.language=en) . Metadata is looking good! Would be great to be able to render the tiffs on the map, currently looking at why they dont...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that looks even better than I would have expected! I've added all the metadata fields you mentioned that I can for now. For some of the fields you mentioned we don't provide the needed metadata in our HyP3 products (i.e. software version info for processing:software).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey forrest, just revisiting this :) I think there are options for processing:software that would be great to track in this metadata. So HyP3 ISCE2 v1.0.0 or HyP3 GAMMA v8.1.2? I could also see usefulness in tracking the underlying software versionisce2-2.6.3, or I suppose pointing to the Docker Image that HYP3 runs ghcr.io/asfhyp3/hyp3-isce2:1.0.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forrestfwilliams @scottyhq we do capture the processing software information in all our products, but not in the most machine-readable way. Typically, with a sentence like (jinja2 templated):

This data was processed by ASF DAAC HyP3 {{ processing_date.year }} using the {{ plugin_name }} plugin version
{{ plugin_version }} running {{ processor_name }} release {{ processor_version }}.

e.g.:
https://github.com/ASFHyP3/hyp3-isce2/blob/develop/src/hyp3_isce2/metadata/templates/insar_burst/insar_burst_readme.md.txt.j2#L19-L20

The plugin name + plugin version directly corresponds to the plugin container used to create the product, so we could easily add that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding software version is a good idea, but we should add this info to our base metadata txt file before we add it to the STAC items. The STAC implementation becomes much simpler once we have this in place.

Copy link

@scottyhq scottyhq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it looks like a month ago I added some comments that maybe never appeared since they are "pending" until I hit the submit review button 🤦. I'm hitting 'approve' since I'm in favor of adding this and tested it out for hyp3-isce2 bursts, but obviously I'm guessing it'll need another ASF reviewer!

src/hyp3_sdk/stac/stac.py Outdated Show resolved Hide resolved
'sar:looks_range': extra_properties['hyp3:range_looks'],
'sat:orbit_state': extra_properties['hyp3:reference_orbit_direction'].lower(),
'sat:absolute_orbit': extra_properties['hyp3:reference_orbit_number'],
'view:azimuth': (360 + extra_properties['hyp3:heading']) % 360, # change of convention

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there are lots of 'heading/azimuth' and incidence conventions out there for SAR LOS conversions. But i suggest also adding view:incidence_angle here (https://github.com/stac-extensions/view#item-properties)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidence angle information is also something we don't currently report in our txt metadata files. We'll need to add this field on the plugin side before we can implement view:incidence_angle.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey forrest, just revisiting this :) I think there are options for processing:software that would be great to track in this metadata. So HyP3 ISCE2 v1.0.0 or HyP3 GAMMA v8.1.2? I could also see usefulness in tracking the underlying software versionisce2-2.6.3, or I suppose pointing to the Docker Image that HYP3 runs ghcr.io/asfhyp3/hyp3-isce2:1.0.0

@jhkennedy
Copy link
Contributor

jhkennedy commented Apr 12, 2024

@scottyhq I'm on the hook to review this! I'll try and get to it next week.

My high-level comments right now that @forrestfwilliams and I have been kicking around are that the actual stac item json should be created by the plugin, not here in the SDK as the pluins have all the information, context, and importantly dependencies to create the item. That would allow, for example, us to add STAC endpoints to hyp3 which I've prototyped here (not following the STAC spec yet though):
https://hyp3-stac.asf.alaska.edu/stac?user_id=ffwilliams2&name=2019_ridgecrest_stac

@forrestfwilliams would prefer to "just get this out" (him) instead of "doing it right" (me) so we can help users now, as getting the work to add it to the plugins prioritized and scheduled in the team backlog is likely to be a slow process and not under either of our control.

@scottyhq, what do you think? I'm def. interested to hear what approach you'd prefer.

I plan on doing an in-depth review this early next week and I'd expect @forrestfwilliams and I will settle on a path forward then.

properties.update(extra_properties)
item = pystac.Item(
id=base_url.split('/')[-1].replace('.zip', ''),
geometry=geo_info.bbox_geojson,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably geometry is the valid data footprint rather than the bbox. could bring in this dependency for convenience https://stactools.readthedocs.io/en/stable/footprint.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. The current geojson demarcates the actual extent of raster products, including the nodata areas. Unfortunately, obtaining the rotated footprint that contains valid data will likely take work on the plugin and will need to wait for future iterations.

@scottyhq
Copy link

the actual stac item json should be created by the plugin, not here in the SDK as the pluins have all the information, context, and importantly dependencies to create the item. That would allow, for example, us to add STAC endpoints to hyp3 which I've prototyped here (not following the STAC spec yet though):

Agreed it makes sense to for the plugins to generate an item.json alongside the current .txt metadata formats. I'm selfishly mostly focused on hyp3-isce2 these days so not concerned with supporting all the plugins, but it does seem centralizing the code here into hyp3-lib or a new hyp3-stac would make sense.

The discussion and iteration on this branch to hone the metadata is definitely useful in the meantime! And I'm glad that people can install from this branch and take the SDK approach if sufficiently motivated :)

@forrestfwilliams
Copy link
Contributor Author

OK @jhkennedy this is ready for your review. Notably, I've punted on adding some of @scottyhq's requested features:

  1. Exact processor versions in processing:software.
  2. view:incidence_angle.
  3. Valid data footprint for item.geometry (still not exactly sure what we should do here).

All of these will be simpler with upstream changes to the plugins/moving the STAC item creation to the plugins.

As a first step though, I think it still makes sense to add the STAC functionality to the SDK, then migrate it to the plugins in the future. This allows us to get this functionality to our users quicker, and makes requires us to work in fewer repositories while we're still nailing down the basics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants