adding liveocean recipe #154

rsignell-usgs · 2022-07-17T10:50:38Z

Closes #152

pangeo-forge-bot · 2022-07-17T10:51:34Z

🎉 New recipe runs created for the following recipes at sha a32e84e224b31e5ace3d6d9d5236114b106f6ef6:

liveocean: https://pangeo-forge.org/dashboard/recipe-run/987

cisaacstern · 2022-07-17T15:33:19Z

Thanks for this contribution, @rsignell-usgs! I'll trigger a test of this recipe now.

cisaacstern · 2022-07-17T15:33:41Z

/run recipe-test recipe_run_id=987

pangeo-forge-bot · 2022-07-18T10:22:50Z

It looks like your meta.yaml does not conform to the specification.

            1 validation error for MetaYaml
    provenance -> providers -> 0 -> description
      field required (type=value_error.missing)

Please correct your meta.yaml and commit the corrections to this PR.

pangeo-forge-bot · 2022-07-18T10:27:15Z

🎉 New recipe runs created for the following recipes at sha 6645efc0ea7252c98ea112dc9200befb9d787921:

liveocean: https://pangeo-forge.org/dashboard/recipe-run/989

rsignell-usgs · 2022-07-18T10:30:08Z

@cisaacstern , what is the next step? I see there is an error in the action here: https://github.com/pangeo-forge/staged-recipes/actions/runs/2689822509

rabernat · 2022-07-18T14:30:53Z

provenance -> providers -> 0 -> description
field required (type=value_error.missing)

FWIW, I find the requirement to provide a description of the provider to be a bit confusing and unnecessary.

rsignell-usgs · 2022-07-18T15:03:28Z

@raberat, agreed! As a first-timer:

I found the "description" confusing.
I didn't know if I could add other fields like ORCID and github to the provider
I didn't know what to use for pangeo_notebook_version

rabernat · 2022-07-20T16:18:59Z

/run recipe-test recipe_run_id=989

rabernat · 2022-07-20T17:52:12Z

@cisaacstern - any idea what's happening with this one? This is our first reference recipe in PF, so I am assuming some assumptions will break.

cisaacstern · 2022-07-20T17:58:21Z

Yes, just checked. This is actually due to the fact that 0.8.3 is the latest version of recipes available on the cloud platform. This is a growing pains thing: obviously the latest recipes release should be automatically available on the cloud! But until we've worked out that automation, these are the kind of undocumented issues that arise. 🙃 ... I'll downgrade the recipes version for this recipe, and re-trigger the test. Edit: I don't think this recipe relies on any 0.9.0 features, so this should be fine.

recipes/liveocean/meta.yaml

pangeo-forge-bot · 2022-07-20T18:00:52Z

🎉 New recipe runs created for the following recipes at sha 5a4367b198622f45411f74804262061cb4063baf:

liveocean: https://pangeo-forge.org/dashboard/recipe-run/993

cisaacstern · 2022-07-20T18:01:59Z

/run recipe-test recipe_run_id=993

rabernat · 2022-07-20T18:02:54Z

This recipe will not deposit a zarr dataset at all, but a kerchunk json file called reference.json. So we will have to deal with that at some point.

pangeo-forge-bot · 2022-07-20T18:03:29Z

✨ A test of your recipe liveocean is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/993

pangeo-forge-bot · 2022-07-20T18:12:44Z

Pangeo Forge Cloud told me that our test of your recipe liveocean failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/993

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

cisaacstern · 2022-07-20T18:48:48Z

We can see in the logs link above that the error is

packages/pangeo_forge_recipes/recipes/reference_hdf_zarr.py", line 51, in finalize mzz = MultiZarrToZarr( TypeError: __init__() got an unexpected keyword argument 'coo_dtypes'

This seems like a kerchunk version issue in the cloud environment, not a recipe issue... I'm investigating.

cisaacstern · 2022-07-20T19:21:03Z

I believe I've fixed this issue by bumping the kerchunk version in the worker image. I'll re-run the test now.

cisaacstern · 2022-07-20T19:21:14Z

/run recipe-test recipe_run_id=993

pangeo-forge-bot · 2022-07-20T19:22:33Z

✨ A test of your recipe liveocean is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/993

pangeo-forge-bot · 2022-07-20T19:28:47Z

Pangeo Forge Cloud told me that our test of your recipe liveocean failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/993

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

recipes/liveocean/meta.yaml

pangeo-forge-bot · 2022-07-20T20:12:30Z

🎉 New recipe runs created for the following recipes at sha 149521f3017c54cb4d4af6d4af1d9069bce3cf06:

Note: This PR is deployed to Pangeo Forge Cloud's dev backend, for which a full frontend website in not currently available. The links below therefore point to plain text information about the created recipe run(s).

liveocean: https://api-staging.pangeo-forge.org/recipe_runs/66

cisaacstern · 2022-07-20T20:18:06Z

/run recipe-test recipe_run_id=66

pangeo-forge-bot · 2022-07-20T20:19:58Z

✨ A test of your recipe liveocean is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

Note: This test is deployed to Pangeo Forge Cloud's dev backend, for which public logs are not yet available.

pangeo-forge-bot · 2022-07-20T20:23:53Z

Unable to open dataset with dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr'

cisaacstern · 2022-07-20T22:16:34Z

This worked on Dataflow 🥳 . But as predicted by Ryan above, as our first reference recipe, @pangeo-forge-bot got confused regarding both:

the path it was stored in (terminates with .zarr); and
how to open it (assumed it was zarr, thus the error message in the last comment)

That being said, the dataset does exist, and is openable. The following files were created:

import s3fs
fs = s3fs.S3FileSystem(anon=True, client_kwargs=dict(endpoint_url="https://ncsa.osn.xsede.org"))
url = "s3://Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr"
fs.ls(url)

['Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.json',
 'Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.yaml']

I wasn't clear how to open this directly over http, so I first downloaded the reference.json

!wget 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.json'

Then opened it according to the method demonstrated in our tutorial here

import fsspec
import xarray as xr

m = fsspec.get_mapper(
    "reference://",
    fo="reference.json",
    target_protocol="file",
    remote_protocol="http",
    remote_options=dict(anon=True),
    skip_instance_cache=True,
)
ds = xr.open_dataset(
    m,
    engine='zarr',
    backend_kwargs={'consolidated': False},
    chunks={},
    decode_coords="all"
)
ds

Click to expand dataset repr

<xarray.Dataset>
Dimensions:         (ocean_time: 2, s_w: 31, eta_rho: 1302, xi_rho: 663,
                     tracer: 11, s_rho: 30, boundary: 4, eta_u: 1302,
                     xi_u: 662, eta_v: 1301, xi_v: 663, eta_psi: 1301,
                     xi_psi: 662)
Coordinates: (12/16)
    Cs_r            (ocean_time, s_rho) float64 dask.array<chunksize=(1, 30), meta=np.ndarray>
    Cs_w            (ocean_time, s_w) float64 dask.array<chunksize=(1, 31), meta=np.ndarray>
    h               (ocean_time, eta_rho, xi_rho) float64 dask.array<chunksize=(1, 651, 332), meta=np.ndarray>
    hc              (ocean_time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    lat_psi         (eta_psi, xi_psi) float64 dask.array<chunksize=(651, 331), meta=np.ndarray>
    lat_rho         (eta_rho, xi_rho) float64 dask.array<chunksize=(651, 332), meta=np.ndarray>
    ...              ...
    lon_u           (eta_u, xi_u) float64 dask.array<chunksize=(651, 331), meta=np.ndarray>
    lon_v           (eta_v, xi_v) float64 dask.array<chunksize=(651, 332), meta=np.ndarray>
  * ocean_time      (ocean_time) datetime64[ns] 2022-03-18 2022-03-18T01:00:00
  * s_rho           (s_rho) float64 -0.9833 -0.95 -0.9167 ... -0.05 -0.01667
  * s_w             (s_w) float64 -1.0 -0.9667 -0.9333 ... -0.06667 -0.03333 0.0
    zeta            (ocean_time, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 1302, 663), meta=np.ndarray>
Dimensions without coordinates: eta_rho, xi_rho, tracer, boundary, eta_u, xi_u,
                                eta_v, xi_v, eta_psi, xi_psi
Data variables: (12/133)
    AKs             (ocean_time, s_w, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 8, 434, 221), meta=np.ndarray>
    AKv             (ocean_time, s_w, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 8, 434, 221), meta=np.ndarray>
    Akk_bak         (ocean_time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    Akp_bak         (ocean_time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    Akt_bak         (ocean_time, tracer) float64 dask.array<chunksize=(1, 11), meta=np.ndarray>
    Akv_bak         (ocean_time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    ...              ...
    zooFegest       (ocean_time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    zooI0           (ocean_time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    zooKs           (ocean_time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    zooMin          (ocean_time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    zooZeta         (ocean_time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    zooplankton     (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 8, 434, 221), meta=np.ndarray>
Attributes: (12/40)
    CPP_options:       U0KB, ADD_FSOBC, ADD_M2OBC, ANA_BPFLUX, ANA_BSFLUX, AN...
    Conventions:       CF-1.4, SGRID-0.3
    NLM_LBC:           \nEDGE:           WEST   SOUTH  EAST   NORTH  \nzeta: ...
    ana_file:          ROMS/Functionals/ana_btflux.h, ROMS/Functionals/ana_st...
    bio_file:          ROMS/Nonlinear/Biology/npzd2o_banas.h
    bpar_file:         /gscratch/macc/parker/LO_roms/cas6_v0_u0kb/f2022.03.18...
    ...                ...
    svn_rev:           824M
    svn_url:           https://www.myroms.org/svn/src/trunk
    tiling:            020x020
    title:             First LiveOcean input file
    type:              ROMS/TOMS history file
    var_info:          /gscratch/macc/parker/LiveOcean_roms/LO_ROMS/ROMS/Exte...

Completing pangeo-forge/pangeo-forge-recipes#268 would be useful for @pangeo-forge-bot to identify the dataset type, then on the backend we could do something like,

dataset_type = getattr(recipe, "dataset_type")
if  dataset_type == "zarr":
    # ...
elif dataset_type == "reference":
   # ...

@rabernat or @rsignell-usgs, what's the most concise way to open

'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.json'

directly over http?

rabernat · 2022-07-21T01:09:41Z

This is excellent progress! 🎉 The reference dataset was successfully created and the reference files were deposited in OSN.

We just have to refactor the orchestration code to not assume that everything deposited will be Zarr. A class variable on each recipe class could be useful here (edit: duh that's exactly what pangeo-forge/pangeo-forge-recipes#268 is 🙃 )

what's the most concise way to open...directly over HTTP?

I'll leave this question to Rich. How do you want to interact with this data?

rsignell-usgs · 2022-07-22T10:21:34Z

Would it be appropriate to have pangeo-forge generate an intake catalog?
That would be the easiest way for users to interact!

sources:
  LiveOcean-Archive:
    driver: intake_xarray.xzarr.ZarrSource
    description: 'LiveOcean Forecast Archive'
    args:
      urlpath: "reference://"
      consolidated: false
      storage_options:
        fo: 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.json'
        remote_options:
          anon: true
          client_kwargs: {'endpoint_url': 'https://mghp.osn.xsede.org'}
        remote_protocol: s3

rabernat · 2022-07-22T11:36:01Z

In fact it already has!

https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.yaml

Which contains

sources:
  data:
    args:
      chunks: {}
      consolidated: false
      storage_options:
        fo: s3:///Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.json
        remote_options:
          anon: true
          client_kwargs:
            endpoint_url: https://mghp.osn.xsede.org/
        remote_protocol: s3
        skip_instance_cache: true
        target_options: {}
        target_protocol: s3
      urlpath: reference://
    description: ''
    driver: intake_xarray.xzarr.ZarrSource

However, it doesn't seem to work

import intake
cat_url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.yaml"
cat = intake.open_catalog(cat_url)
ds = cat.data.to_dask()

NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist

@martindurant - do you see any problem with the intake file?

rsignell-usgs · 2022-07-22T13:08:49Z

The intake catalog I supplied above works.

The catalog produced by pangeo-forge looks like it has a few problems:

extra slash in the URL to the JSON
even after converting 3 slashes to 2, the JSON URL doesn't seem to be public

import fsspec
fs = fsspec.filesystem('s3', anon=True) 
fs.ls('s3://Pangeo/pangeo-forge/')

returns
"No Such Bucket"

martindurant · 2022-07-22T13:24:34Z

Upper case bucket names are pretty unusual, but it seems to be allowed; but I also get NoSuchBucket with either "Pangeo" or "pangeo". On AWS S3, "pangeo" does exist, but needs credentials.

rabernat · 2022-07-22T13:25:40Z

Remember that this data is on OSN, so you need the custom endpoints. The really complicated part is that there are actually two enpoints:

https://ncsa.osn.xsede.org/ - this is where the reference files live
https://mghp.osn.xsede.org/ - this is where the actual netcdf files live

martindurant · 2022-07-22T13:27:58Z

I am supposing that the target_options (to read the json file) should be the same as the remote_options (to read the data); but I still get no-bucket:

s3 = fsspec.filesystem("s3", anon=True, client_kwargs={"endpoint_url": "https://mghp.osn.xsede.org/"})
s3.cat("Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.json")

rsignell-usgs · 2022-07-22T14:07:37Z

This slight modification of the pangeo-forge catalog works. Just needed to flesh out the target_options to include anon=True and the endpoint_url:

sources:
  data:
    args:
      chunks: {}
      consolidated: false
      storage_options:
        fo: s3://Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.json
        remote_options:
          anon: true
          client_kwargs:
            endpoint_url: https://mghp.osn.xsede.org/
        remote_protocol: s3
        skip_instance_cache: true
        target_options:
          anon: true
          client_kwargs:
            endpoint_url: https://ncsa.osn.xsede.org/
        target_protocol: s3
      urlpath: reference://
    description: ''
    driver: intake_xarray.xzarr.ZarrSource

martindurant · 2022-07-22T15:48:07Z

Right, different endpoints :)

cisaacstern · 2022-07-22T17:34:18Z

To generate that automatically, I think we'll need a PR to pass target_options through to MultiZarrToZarr here?

rsignell-usgs · 2022-07-28T12:18:49Z

@martindurant, I could try this, but I have a feeling it would be smoother if you did the PR!

rsignell-usgs · 2022-08-01T20:33:55Z

@peterm790, want to take a stab at fixing this?

rsignell-usgs · 2022-08-16T15:46:53Z

@peterm790, just putting this back on your radar...

peterm790 · 2022-08-16T15:57:12Z

Hi yes, I have a branch set up with target_options added here. I just haven't worked out a way of setting up a test for this. If that is even needed?

rsignell-usgs · 2022-08-16T16:16:19Z

@peterm790 go ahead and submit a PR and the team will tell you what's needed!

pangeo-forge-bot · 2022-08-31T11:51:33Z

🎉 New recipe runs created for the following recipes at sha 149521f3017c54cb4d4af6d4af1d9069bce3cf06:

Note: This PR is deployed to Pangeo Forge Cloud's dev backend, for which a full frontend website in not currently available. The links below therefore point to plain text information about the created recipe run(s).

liveocean: https://api-staging.pangeo-forge.org/recipe_runs/73

cisaacstern · 2022-08-31T15:35:55Z

@rsignell-usgs thanks so much for re-opening this. IIUC, this depends on pangeo-forge/pangeo-forge-recipes#399, which has not been released yet. So a couple last blockers (all on my side) before we can run this here:

Release pangeo-forge-recipes. This is easy, can happen anytime.
Our current @pangeo-forge-bot backend is difficult to update with new pangeo-forge-recipes releases. I'm pushing to deploy a new version which will be easier to update this week.
Once the new backend is deployed, I'll need to update https://github.com/pangeo-data/pangeo-docker-images/tree/master/forge to get the latest pangeo-forge-recipes there.

Getting close here!

adding liveocean recipe

a32e84e

modifying provider info for Parker MacCready

2d24dae

Update meta.yaml

6645efc

cisaacstern reviewed Jul 20, 2022

View reviewed changes

recipes/liveocean/meta.yaml Show resolved Hide resolved

downgrade recipes version for cloud environment

5a4367b

pangeo-forge deleted a comment from pangeo-forge-bot Jul 20, 2022

cisaacstern added the dev (dev use only) directs registrar calls to staging api label Jul 20, 2022

cisaacstern reviewed Jul 20, 2022

View reviewed changes

recipes/liveocean/meta.yaml Outdated Show resolved Hide resolved

bump back to 0.9.0 for dataflow

149521f

peterm790 mentioned this pull request Aug 17, 2022

Reference Recipe Add Target Options pangeo-forge/pangeo-forge-recipes#399

Merged

rsignell-usgs closed this by deleting the head repository Aug 19, 2022

rsignell-usgs reopened this Aug 31, 2022

rsignell-usgs closed this Aug 31, 2022

rsignell-usgs mentioned this pull request Aug 31, 2022

Liveocean #179

Merged

adding liveocean recipe #154

adding liveocean recipe #154

Conversation

rsignell-usgs commented Jul 17, 2022

pangeo-forge-bot commented Jul 17, 2022

cisaacstern commented Jul 17, 2022

cisaacstern commented Jul 17, 2022

pangeo-forge-bot commented Jul 18, 2022

pangeo-forge-bot commented Jul 18, 2022

rsignell-usgs commented Jul 18, 2022

rabernat commented Jul 18, 2022

rsignell-usgs commented Jul 18, 2022

rabernat commented Jul 20, 2022

rabernat commented Jul 20, 2022

cisaacstern commented Jul 20, 2022 • edited Loading

pangeo-forge-bot commented Jul 20, 2022

cisaacstern commented Jul 20, 2022

rabernat commented Jul 20, 2022

pangeo-forge-bot commented Jul 20, 2022

pangeo-forge-bot commented Jul 20, 2022

cisaacstern commented Jul 20, 2022

cisaacstern commented Jul 20, 2022

cisaacstern commented Jul 20, 2022

pangeo-forge-bot commented Jul 20, 2022

pangeo-forge-bot commented Jul 20, 2022

pangeo-forge-bot commented Jul 20, 2022

cisaacstern commented Jul 20, 2022

pangeo-forge-bot commented Jul 20, 2022

pangeo-forge-bot commented Jul 20, 2022

cisaacstern commented Jul 20, 2022

rabernat commented Jul 21, 2022 • edited Loading

rsignell-usgs commented Jul 22, 2022 • edited Loading

rabernat commented Jul 22, 2022

rsignell-usgs commented Jul 22, 2022

martindurant commented Jul 22, 2022

rabernat commented Jul 22, 2022 • edited Loading

martindurant commented Jul 22, 2022

rsignell-usgs commented Jul 22, 2022 • edited Loading

martindurant commented Jul 22, 2022

cisaacstern commented Jul 22, 2022

rsignell-usgs commented Jul 28, 2022

rsignell-usgs commented Aug 1, 2022 • edited Loading

rsignell-usgs commented Aug 16, 2022

peterm790 commented Aug 16, 2022

rsignell-usgs commented Aug 16, 2022

pangeo-forge-bot commented Aug 31, 2022

cisaacstern commented Aug 31, 2022 • edited Loading

cisaacstern commented Jul 20, 2022 •

edited

Loading

rabernat commented Jul 21, 2022 •

edited

Loading

rsignell-usgs commented Jul 22, 2022 •

edited

Loading

rabernat commented Jul 22, 2022 •

edited

Loading

rsignell-usgs commented Jul 22, 2022 •

edited

Loading

rsignell-usgs commented Aug 1, 2022 •

edited

Loading

cisaacstern commented Aug 31, 2022 •

edited

Loading