Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intake catalog reference.yaml for kerchunked datasets #449

Open
rsignell-usgs opened this issue Dec 8, 2022 · 1 comment
Open

Fix intake catalog reference.yaml for kerchunked datasets #449

rsignell-usgs opened this issue Dec 8, 2022 · 1 comment

Comments

@rsignell-usgs
Copy link

For kerchunked datasets recipes, the currently generated intake catalogs don't work because the OSN endpoint_url is not included.
For example, for the NWM-2.1-grid1km-LDAS recipe, we get:

sources:
  data:
    args:
      chunks: {}
      consolidated: false
      storage_options:
        fo: Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1393/NWM-2.1-grid1km-LDAS.zarr/reference.json
        remote_options:
          anon: true
        remote_protocol: s3
        skip_instance_cache: true
        target_options: {}
        target_protocol: s3
      urlpath: reference://
    description: ''
    driver: intake_xarray.xzarr.ZarrSource

but the fo doesn't work as a remote_protocol: s3 for OSN because the endpoint_url is not specified.

Two solutions:

  1. Keep target_protocol: s3, but add specify target_options that include endpoint_url as a client_kwarg.
  2. Switch to target_protocol: https, and specify fo with the https path

These solutions both work:

Solution 1:

sources:    
  data:
    driver: intake_xarray.xzarr.ZarrSource
    description: ''
    args:
      urlpath: "reference://"
      consolidated: false
      storage_options:
        target_options:
          anon: true
          client_kwargs: {'endpoint_url': 'https://ncsa.osn.xsede.org'}
        fo: 's3://Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1393/NWM-2.1-grid1km-LDAS.zarr/reference.json'
        remote_options:
          anon: true
        remote_protocol: "s3"

Solution 2:

sources:
  data:
    args:
      chunks: {}
      consolidated: false
      storage_options:
        fo: 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1393/NWM-2.1-grid1km-LDAS.zarr/reference.json'
        remote_options:
          anon: true
        remote_protocol: s3
        skip_instance_cache: true
        target_options: {}
      urlpath: reference://
    description: ''
    driver: intake_xarray.xzarr.ZarrSource

The relevant code is at https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/pangeo_forge_recipes/recipes/reference_hdf_zarr.py#L77-L83

@sharkinsspatial is this something you can fix?

@cisaacstern
Copy link
Member

Thanks for spotting this issue and documenting it here, @rsignell-usgs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants