-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add chelsa recipe #133
base: master
Are you sure you want to change the base?
add chelsa recipe #133
Conversation
It looks like your
Please correct your |
🎉 New recipe runs created for the following recipes at sha
|
🎉 New recipe runs created for the following recipes at sha
|
I'll now deploy a test run of this recipe on Pangeo Forge Cloud 🚀 |
/run recipe-test recipe_run_id=111 |
✨ A test of your recipe I'll notify you with a comment on this thread when this test is complete. (This could be a little while...) In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/111 |
Pangeo Cloud told me that our test of your recipe To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/111 If you haven't yet tried pruning and running your recipe locally, I suggest trying that now. Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps! |
I tested the pruned recipe locally and got an error when running the recipe function. recipe_pruned = recipe.copy_pruned()
<FilePattern {'month': 2, 'variable': 10}>
run_function = recipe_pruned.to_function()
run_function() The error message is quite long, but I suspect it has something to do with converting tifs to zarr with Xarray within
|
This error actually appears to be related to #133 (comment): xarray is trying to open the source file with the recipe = XarrayZarrRecipe(
# etc
xarray_open_kwargs={"engine": "rasterio"},
} should move us past this particular error. Ryan's suggestion in the linked comment to download one of the source files and open it with xarray is a good way to see which (if any) additional |
One problem I see is that when using import xarray as xr
import rioxarray as rxr
url = 'https://os.zhdk.cloud.switch.ch/envicloud/chelsa/chelsa_V2/GLOBAL/climatologies/1981-2010/tasmax/CHELSA_tasmax_01_1981-2010_V.2.1.tif'
da1 = xr.open_dataset(url, engine="rasterio")
print(f'Opening with xarray... \n {da1} \n')
da2 = rxr.open_rasterio(url)
print(f'Opening with rioxarray... \n {da2}')
|
it appears that
In [35]: da1 = xr.open_dataset(url, engine="rasterio", backend_kwargs={'mask_and_scale': False})
In [36]: da1
Out[36]:
<xarray.Dataset>
Dimensions: (band: 1, x: 43200, y: 20880)
Coordinates:
* band (band) int64 1
* x (x) float64 -180.0 -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
* y (y) float64 84.0 83.99 83.98 83.97 ... -89.98 -89.99 -90.0
spatial_ref int64 ...
Data variables:
band_data (band, y, x) uint16 ...
In [37]: da1.band_data.attrs
Out[37]: {'scale_factor': 0.1, 'add_offset': -273.15} |
Wow, thanks for this insight, @andersy005! Hillary, if this looks workable to you, you should be able to achieve what Anderson just demonstrated via recipe = XarrayZarrRecipe(
# etc
xarray_open_kwargs={"engine": "rasterio", "backend_kwargs": {"mask_and_scale": False}},
} The forthcoming pangeo-forge/pangeo-forge-recipes#245 would allow you to use |
I spoke too soon. Xarray keeps the In [42]: da1 = xr.open_dataset(url, engine="rasterio")
In [43]: da1
Out[43]:
<xarray.Dataset>
Dimensions: (band: 1, x: 43200, y: 20880)
Coordinates:
* band (band) int64 1
* x (x) float64 -180.0 -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
* y (y) float64 84.0 83.99 83.98 83.97 ... -89.98 -89.99 -90.0
spatial_ref int64 ...
Data variables:
band_data (band, y, x) float32 ...
In [44]: da1.band_data.attrs
Out[44]: {}
In [45]: da1.band_data.encoding
Out[45]:
{'dtype': 'uint16',
'scale_factor': 0.1,
'add_offset': -273.15,
'grid_mapping': 'spatial_ref',
'source': 'https://os.zhdk.cloud.switch.ch/envicloud/chelsa/chelsa_V2/GLOBAL/climatologies/1981-2010/tasmax/CHELSA_tasmax_01_1981-2010_V.2.1.tif',
'rasterio_dtype': 'uint16'} |
For creating multi-file zarrs from many individual files, I think the right behavior is to decode all the variables. Imagine we had two datasets with different PFR automatically deletes all encoding on the inputs here: |
To summarize, starting from Hillary's comment #133 (comment): for the provided url, IIUC based on Ryan's comment, this I propose we try the inexpensive test of just running this recipe (in test form) with @hscannell, if that makes sense as a next step to you, could you push a commit adding this kwarg to the recipe? If the zarr store produced by the test is missing key attributes, we can address that before merging this PR. |
We want that to happen. It's a feature, not a bug. Mask and scale attributes are not proper "metadata" in the same way that, say, license or provenance are. They can be thought of more as compression codecs. The default behavior of xr.open_dataset is correct here IMO. |
Thank you @andersy005 for helping us figure this out! I didn't know that the attributes could be called from The desired result would be for the data to be scaled and offset using this encoding. @cisaacstern and @rabernat I assume Zarr and therefore PFR knows how to decode this information. The desired result should ultimately do the following, with a sanity check plot for max June surface air temperature. import xarray as xr
url = 'https://os.zhdk.cloud.switch.ch/envicloud/chelsa/chelsa_V2/GLOBAL/climatologies/1981-2010/tasmax/CHELSA_tasmax_06_1981-2010_V.2.1.tif'
ds = xr.open_dataset(url, engine="rasterio")
decoded_data = (ds.band_data[0,1000:5000,1000:5000]*ds.band_data.encoding['scale_factor'])+ds.band_data.encoding['add_offset']
decoded_data.plot(vmin=0, vmax=20) |
It looks like there may be a problem with the structure of your PR. I encountered a |
Yikes so pre-commit is wreaking havoc here. I think we need to uninstall it for |
It looks like there may be a problem with the structure of your PR. I encountered a |
I'm really confused why this PR is now changing other recipes 🤔 |
Yes, sorry about that! It's because the |
Ok! I believe we should be back on track here, and apologies again for the confusion. As soon as @pangeo-forge-bot creates a new recipe run for us, I'll trigger a new test of the recipe. |
🎉 New recipe runs created for the following recipes at sha
|
/run recipe-test recipe_run_id=144 |
✨ A test of your recipe I'll notify you with a comment on this thread when this test is complete. (This could be a little while...) In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/144 |
Pangeo Cloud told me that our test of your recipe To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/144 If you haven't yet tried pruning and running your recipe locally, I suggest trying that now. Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps! |
for more information, see https://pre-commit.ci
Co-authored-by: Ryan Abernathey <[email protected]>
3653266
to
c1d6b60
Compare
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
/run chelsa-v2.1 |
The test failed, but I'm sure we can find out why! Pangeo Forge maintainers are working diligently to provide public logs for contributors. |
/run chelsa-v2.1 |
The test failed, but I'm sure we can find out why! Pangeo Forge maintainers are working diligently to provide public logs for contributors. |
The failure seems to be related to the same serialization issue reported in #145 (comment) Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
response = task()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
lambda: self.create_worker().do_instruction(request), request)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
return getattr(self, request_type)(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
bundle_processor.process_bundle(instruction_id))
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
input_op_by_transform_id[element.transform_id].process_encoded(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
self.output(decoded_value)
File "apache_beam/runners/worker/operations.py", line 526, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 528, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 237, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1491, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 623, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1581, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1694, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 240, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 907, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 908, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1419, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 1507, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 837, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 983, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1877, in <lambda>
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/executors/beam.py", line 14, in _no_arg_stage
fun(config=config)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 587, in prepare_target
for k, v in config.get_execution_context().items():
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/base.py", line 59, in get_execution_context
recipe_hash=self.sha256().hex(),
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/base.py", line 53, in sha256
return dataclass_sha256(self, ignore_keys=self._hash_exclude_)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/serialization.py", line 73, in dataclass_sha256
return dict_to_sha256(d)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/serialization.py", line 34, in dict_to_sha256
b = dumps(
File "/srv/conda/envs/notebook/lib/python3.9/json/__init__.py", line 234, in dumps
return cls(
File "/srv/conda/envs/notebook/lib/python3.9/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/srv/conda/envs/notebook/lib/python3.9/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/serialization.py", line 22, in either_encode_or_hash
return inspect.getsource(obj)
File "/srv/conda/envs/notebook/lib/python3.9/inspect.py", line 1024, in getsource
lines, lnum = getsourcelines(object)
File "/srv/conda/envs/notebook/lib/python3.9/inspect.py", line 1006, in getsourcelines
lines, lnum = findsource(object)
File "/srv/conda/envs/notebook/lib/python3.9/inspect.py", line 827, in findsource
raise OSError('source code not available')
OSError: source code not available [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/prepare_target-ptransform-56'] |
/run chelsa-v2.1 |
🎉 The test run of import xarray as xr
store = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1332/chelsa-v2.1.zarr"
ds = xr.open_dataset(store, engine='zarr', chunks={})
ds |
@hscannell, the latest run seems to have succeeded. https://pangeo-forge.org/dashboard/recipe-run/1332?feedstock_id=1 let us know if this is ready, and we shall merge it into main. |
CHELSA v2.1 is a globally downscaled climate dataset provided by the Swiss Federal Institute for Forest, Snow and Landscape Research WSL. This recipe extracts global monthly climatologies (1981-2010) for 10 variables. The outputs are returned as geoTIFFs.
The reference for this dataset is coped below.