Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Add instance ids via dict_object #6

Closed
wants to merge 2 commits into from

Conversation

cisaacstern
Copy link
Member

This PR supersedes #2. It achieves the same goal as that PR, but more concisely, by using the newly-added dict_object feature in Pangeo Forge Cloud.

@cisaacstern cisaacstern mentioned this pull request May 6, 2022
@pangeo-forge-bot
Copy link
Collaborator

🎉 New recipe runs created for the following recipes at sha 6f0e4141c01d451659f5619a7a976dd5a416fb0a:

@cisaacstern
Copy link
Member Author

The first two recipes listed in the previous comment are unchanged by this PR. Once the story tracked in pangeo-forge/user-stories#3 is complete, Pangeo Forge Cloud would recognize this, and exclude them from the list. (If they're unchanged, we don't need to test them.)

@cisaacstern
Copy link
Member Author

/run recipe-test recipe_run_id=157

@cisaacstern
Copy link
Member Author

/run recipe-test recipe_run_id=158

@cisaacstern
Copy link
Member Author

/run recipe-test recipe_run_id=159

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/157

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.sithick.gn.v20180701 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/158

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.siconc.gn.v20180701 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/159

@pangeo-forge-bot
Copy link
Collaborator

Pangeo Forge Cloud told me that our test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.sithick.gn.v20180701 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/158

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

@pangeo-forge-bot
Copy link
Collaborator

Pangeo Forge Cloud told me that our test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/157

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

@pangeo-forge-bot
Copy link
Collaborator

Pangeo Forge Cloud told me that our test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.siconc.gn.v20180701 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/159

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

@jbusecke
Copy link
Collaborator

jbusecke commented May 10, 2022

@cisaacstern I can probably work on this tomorrow but wanted to check my steps with you:

  1. Refactor the recipe to read inputs from a text file
  2. Create some script, (which I can execute locally) to 'parse' instance ids via esgf from queries with wildcards. I suppose we will end up with two text files (one input_urls.txt and another expanded_urls.txt or similar)
  3. Add the first request from @rebeccaherman1 to input_urls.txt, (Requesting data to be added to the cloud #3) parse it manually and see if they might run by themselves?

Most importantly, do you see any problem with this workflow in terms of security? Should we parse the instance_ids into a json files instead of text?

@cisaacstern
Copy link
Member Author

@jbusecke, all three of the recipes tested in this PR (including the 2 that previously worked, under different names) failed. Before we move forward with any approach to adding new datasets, I suggest we dig into the logs linked above, and get all three of these recipes to succeed.

In terms of how we'd move forward with adding new dataset id's after this, I think we can add at least a few dozen directly to recipe.py (just as we are doing in this PR). If we can get something on the order of a few dozen recipes to succeed in this manner, then we could consider some other manner of feeding the inputs to the recipe (text file, etc.).

Feel free to ping me tomorrow when you're working on this if you'd like to review the logs together.

@jbusecke
Copy link
Collaborator

Oh yes, sorry I did not check properly. I was under the (wrong) assumption that the two that previously worked, did work here aswell.

@jbusecke
Copy link
Collaborator

Actually where do you see that the two previously running ones fail? These were the CanESM ones (155, 156), which as far as I can see were not run and still show queued on my end.

@cisaacstern
Copy link
Member Author

Oh good catch! You are correct, the three that failed were the three added by this PR. I deliberately did not run the two existing recipes initially, but will do that now as a double-check.

@cisaacstern
Copy link
Member Author

/run recipe-test recipe_run_id=155

@cisaacstern
Copy link
Member Author

/run recipe-test recipe_run_id=156

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.Omon.zos.gn.v20190429 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/155

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.Omon.so.gn.v20190429 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/156

@pangeo-forge-bot
Copy link
Collaborator

🥳 Hooray! The test execution of your recipe CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.Omon.zos.gn.v20190429 succeeded.

Here is a static representation of the dataset built by this recipe:

            <xarray.Dataset>
    Dimensions:             (i: 360, j: 291, time: 1980, bnds: 2, vertices: 4)
    Coordinates:
      * i                   (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359
      * j                   (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290
        latitude            (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
        longitude           (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
      * time                (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:0...
    Dimensions without coordinates: bnds, vertices
    Data variables:
        time_bnds           (time, bnds) object dask.array<chunksize=(360, 2), meta=np.ndarray>
        vertices_latitude   (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
        vertices_longitude  (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
        zos                 (time, j, i) float32 dask.array<chunksize=(360, 291, 360), meta=np.ndarray>
    Attributes: (12/53)
        CCCma_model_hash:            3dedf95315d603326fde4f5340dc0519d80d10c0
        CCCma_parent_runid:          rc3-pictrl
        CCCma_pycmor_hash:           33c30511acc319a98240633965a04ca99c26427e
        CCCma_runid:                 rc3.1-his01
        Conventions:                 CF-1.7 CMIP-6.2
        YMDH_branch_time_in_child:   1850:01:01:00
        ...                          ...
        table_info:                  Creation Date:(20 February 2019) MD5:374fbe5...
        title:                       CanESM5 output prepared for CMIP6
        tracking_id:                 hdl:21.14100/99be0cf0-54b1-405b-b46c-e69c274...
        variable_id:                 zos
        variant_label:               r1i1p1f1
        version:                     v20190429

You can also open this dataset by running the following Python code

import fsspec
import xarray as xr

dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/prod/recipe-run-155/pangeo-forge/cmip6-feedstock/CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.Omon.zos.gn.v20190429.zarr'
mapper = fsspec.get_mapper(dataset_public_url)
ds = xr.open_zarr(mapper, consolidated=True)
ds

in this badge (or your Python interpreter of choice).

Checklist

Please copy-and-paste the list below into a new comment on this thread, and check the boxes off as you've reviewed them.

Note: This test execution is limited to two increments in the concatenation dimension, so you should expect the length of that dimension (e.g, "time" or equivalent) to be 2.

- [ ] Are the dimension lengths correct?
- [ ] Are all of the expected variables present?
- [ ] Does plotting the data produce a plot that looks like your dataset?
- [ ] Can you run a simple computation/reduction on the data and produce a plausible result?

@pangeo-forge-bot
Copy link
Collaborator

🥳 Hooray! The test execution of your recipe CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.Omon.so.gn.v20190429 succeeded.

Here is a static representation of the dataset built by this recipe:

            <xarray.Dataset>
    Dimensions:             (i: 360, j: 291, lev: 45, bnds: 2, time: 252,
                             vertices: 4)
    Coordinates:
      * i                   (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359
      * j                   (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290
        latitude            (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
      * lev                 (lev) float64 3.047 9.454 16.36 ... 5.375e+03 5.625e+03
        longitude           (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
      * time                (time) object 1850-01-16 12:00:00 ... 1870-12-16 12:0...
    Dimensions without coordinates: bnds, vertices
    Data variables:
        lev_bnds            (lev, bnds) float64 dask.array<chunksize=(45, 2), meta=np.ndarray>
        so                  (time, lev, j, i) float32 dask.array<chunksize=(6, 45, 291, 360), meta=np.ndarray>
        time_bnds           (time, bnds) object dask.array<chunksize=(6, 2), meta=np.ndarray>
        vertices_latitude   (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
        vertices_longitude  (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
    Attributes: (12/53)
        CCCma_model_hash:            3dedf95315d603326fde4f5340dc0519d80d10c0
        CCCma_parent_runid:          rc3-pictrl
        CCCma_pycmor_hash:           33c30511acc319a98240633965a04ca99c26427e
        CCCma_runid:                 rc3.1-his01
        Conventions:                 CF-1.7 CMIP-6.2
        YMDH_branch_time_in_child:   1850:01:01:00
        ...                          ...
        table_info:                  Creation Date:(20 February 2019) MD5:374fbe5...
        title:                       CanESM5 output prepared for CMIP6
        tracking_id:                 hdl:21.14100/afebd704-0f03-4b26-98c5-a62f564...
        variable_id:                 so
        variant_label:               r1i1p1f1
        version:                     v20190429

You can also open this dataset by running the following Python code

import fsspec
import xarray as xr

dataset_public_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge-test/prod/recipe-run-156/pangeo-forge/cmip6-feedstock/CMIP6.CMIP.CCCma.CanESM5.historical.r1i1p1f1.Omon.so.gn.v20190429.zarr'
mapper = fsspec.get_mapper(dataset_public_url)
ds = xr.open_zarr(mapper, consolidated=True)
ds

in this badge (or your Python interpreter of choice).

Checklist

Please copy-and-paste the list below into a new comment on this thread, and check the boxes off as you've reviewed them.

Note: This test execution is limited to two increments in the concatenation dimension, so you should expect the length of that dimension (e.g, "time" or equivalent) to be 2.

- [ ] Are the dimension lengths correct?
- [ ] Are all of the expected variables present?
- [ ] Does plotting the data produce a plot that looks like your dataset?
- [ ] Can you run a simple computation/reduction on the data and produce a plausible result?

@cisaacstern
Copy link
Member Author

Ok great, so it looks like it's just the 3 new instance ids added by this PR which are failing.

feedstock/recipe.py Outdated Show resolved Hide resolved
Co-authored-by: Julius Busecke <[email protected]>
@pangeo-forge-bot
Copy link
Collaborator

🎉 New recipe runs created for the following recipes at sha 07d66727aa85646b5a0eb01fba5c64b3f499c741:

@cisaacstern
Copy link
Member Author

/run recipe-test recipe_run_id=162

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/162

@pangeo-forge-bot
Copy link
Collaborator

Pangeo Forge Cloud told me that our test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/162

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants