-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding liveocean recipe #154
Conversation
🎉 New recipe runs created for the following recipes at sha |
Thanks for this contribution, @rsignell-usgs! I'll trigger a test of this recipe now. |
/run recipe-test recipe_run_id=987 |
It looks like your
Please correct your |
🎉 New recipe runs created for the following recipes at sha |
@cisaacstern , what is the next step? I see there is an error in the action here: https://github.com/pangeo-forge/staged-recipes/actions/runs/2689822509 |
FWIW, I find the requirement to provide a description of the provider to be a bit confusing and unnecessary. |
@raberat, agreed! As a first-timer:
|
/run recipe-test recipe_run_id=989 |
@cisaacstern - any idea what's happening with this one? This is our first reference recipe in PF, so I am assuming some assumptions will break. |
Yes, just checked. This is actually due to the fact that |
🎉 New recipe runs created for the following recipes at sha |
/run recipe-test recipe_run_id=993 |
This recipe will not deposit a zarr dataset at all, but a kerchunk json file called |
✨ A test of your recipe I'll notify you with a comment on this thread when this test is complete. (This could be a little while...) In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/993 |
Pangeo Forge Cloud told me that our test of your recipe To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/993 If you haven't yet tried pruning and running your recipe locally, I suggest trying that now. Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps! |
We can see in the logs link above that the error is
This seems like a kerchunk version issue in the cloud environment, not a recipe issue... I'm investigating. |
I believe I've fixed this issue by bumping the |
/run recipe-test recipe_run_id=993 |
✨ A test of your recipe I'll notify you with a comment on this thread when this test is complete. (This could be a little while...) In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/993 |
Pangeo Forge Cloud told me that our test of your recipe To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/993 If you haven't yet tried pruning and running your recipe locally, I suggest trying that now. Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps! |
🎉 New recipe runs created for the following recipes at sha
|
/run recipe-test recipe_run_id=66 |
✨ A test of your recipe I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)
|
Unable to open dataset with dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr' |
This worked on Dataflow 🥳 . But as predicted by Ryan above, as our first reference recipe, @pangeo-forge-bot got confused regarding both:
That being said, the dataset does exist, and is openable. The following files were created: import s3fs
fs = s3fs.S3FileSystem(anon=True, client_kwargs=dict(endpoint_url="https://ncsa.osn.xsede.org"))
url = "s3://Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr"
fs.ls(url)
I wasn't clear how to open this directly over http, so I first downloaded the reference.json
Then opened it according to the method demonstrated in our tutorial here import fsspec
import xarray as xr
m = fsspec.get_mapper(
"reference://",
fo="reference.json",
target_protocol="file",
remote_protocol="http",
remote_options=dict(anon=True),
skip_instance_cache=True,
)
ds = xr.open_dataset(
m,
engine='zarr',
backend_kwargs={'consolidated': False},
chunks={},
decode_coords="all"
)
ds Click to expand dataset repr
Completing pangeo-forge/pangeo-forge-recipes#268 would be useful for @pangeo-forge-bot to identify the dataset type, then on the backend we could do something like, dataset_type = getattr(recipe, "dataset_type")
if dataset_type == "zarr":
# ...
elif dataset_type == "reference":
# ... @rabernat or @rsignell-usgs, what's the most concise way to open
directly over http? |
This is excellent progress! 🎉 The reference dataset was successfully created and the reference files were deposited in OSN. We just have to refactor the orchestration code to not assume that everything deposited will be Zarr. A class variable on each recipe class could be useful here (edit: duh that's exactly what pangeo-forge/pangeo-forge-recipes#268 is 🙃 )
I'll leave this question to Rich. How do you want to interact with this data? |
Would it be appropriate to have pangeo-forge generate an intake catalog? sources:
LiveOcean-Archive:
driver: intake_xarray.xzarr.ZarrSource
description: 'LiveOcean Forecast Archive'
args:
urlpath: "reference://"
consolidated: false
storage_options:
fo: 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.json'
remote_options:
anon: true
client_kwargs: {'endpoint_url': 'https://mghp.osn.xsede.org'}
remote_protocol: s3 |
In fact it already has!
Which contains sources:
data:
args:
chunks: {}
consolidated: false
storage_options:
fo: s3:///Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.json
remote_options:
anon: true
client_kwargs:
endpoint_url: https://mghp.osn.xsede.org/
remote_protocol: s3
skip_instance_cache: true
target_options: {}
target_protocol: s3
urlpath: reference://
description: ''
driver: intake_xarray.xzarr.ZarrSource However, it doesn't seem to work import intake
cat_url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/staging/recipe-run-66/pangeo-forge/staged-recipes/liveocean.zarr/reference.yaml"
cat = intake.open_catalog(cat_url)
ds = cat.data.to_dask()
@martindurant - do you see any problem with the intake file? |
The intake catalog I supplied above works. The catalog produced by pangeo-forge looks like it has a few problems:
import fsspec
fs = fsspec.filesystem('s3', anon=True)
fs.ls('s3://Pangeo/pangeo-forge/') returns |
Upper case bucket names are pretty unusual, but it seems to be allowed; but I also get NoSuchBucket with either "Pangeo" or "pangeo". On AWS S3, "pangeo" does exist, but needs credentials. |
Remember that this data is on OSN, so you need the custom endpoints. The really complicated part is that there are actually two enpoints:
|
I am supposing that the target_options (to read the json file) should be the same as the remote_options (to read the data); but I still get no-bucket:
|
This slight modification of the pangeo-forge catalog works. Just needed to flesh out the
|
Right, different endpoints :) |
To generate that automatically, I think we'll need a PR to pass |
@martindurant, I could try this, but I have a feeling it would be smoother if you did the PR! |
@peterm790, want to take a stab at fixing this? |
@peterm790, just putting this back on your radar... |
Hi yes, I have a branch set up with |
@peterm790 go ahead and submit a PR and the team will tell you what's needed! |
🎉 New recipe runs created for the following recipes at sha
|
@rsignell-usgs thanks so much for re-opening this. IIUC, this depends on pangeo-forge/pangeo-forge-recipes#399, which has not been released yet. So a couple last blockers (all on my side) before we can run this here:
Getting close here! |
Closes #152