You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The postprocessing of the prognostic run output has been failing several times for me recently when it tries to consolidate the metdata of an appended zarr from a segmented run. This leaves a recoverable zarr but with an incorrect consolidated metadata file, and obviously stops the job between segments with only some zarrs updated. Very roughly recently it seems like about 5-10% of the time a C384 segment's data are appended, this error appears. (It doesn't seem to happen for C48 runs as often.)
fs.cat appears to be failing with an odd message ("User project specified in the request is invalid."). I wonder if the C384 runs are making too many gcsfs API calls.
e.g.:
INFO:/fv3net/workflows/post_process_run/fv3post/consolidate_metadata.py:Consolidating metadata of vcm-ml-experiments/c384-ml/2022-07-18/nn-seed-0-c384-run/fv3gfs_run/physics_tendencies.zarr
File "/usr/local/lib/python3.8/dist-packages/gcsfs/retry.py", line 115, in retry_request
File "<decorator-gen-2>", line 2, in _request
status, headers, info, contents = await self._request(
File "/usr/local/lib/python3.8/dist-packages/gcsfs/core.py", line 386, in _call
headers, out = await self._call("GET", u2, headers=head)
File "/usr/local/lib/python3.8/dist-packages/gcsfs/core.py", line 735, in _cat_file
return await fut
File "/usr/lib/python3.8/asyncio/tasks.py", line 455, in wait_for
raise ex
File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 402, in _cat
result[0] = await coro
File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 25, in _runner
raise return_result
File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 71, in sync
return sync(self.loop, func, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 91, in wrapper
return json_loads(fs.cat(url))
File "/fv3net/workflows/post_process_run/fv3post/consolidate_metadata.py", line 38, in maybe_get
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
raise self._exception
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
yield fs.pop().result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
metadata_with_nan = dict(zip(keys_to_get, values))
File "/fv3net/workflows/post_process_run/fv3post/consolidate_metadata.py", line 45, in _get_metadata_fs
meta = _get_metadata_fs(fs, root)
File "/fv3net/workflows/post_process_run/fv3post/consolidate_metadata.py", line 53, in consolidate_metadata
consolidate_metadata(fs, absolute_target_paths[0])
File "/fv3net/workflows/post_process_run/fv3post/append.py", line 249, in append_zarr_along_time
append_zarr_along_time(tmp_rundir_file, destination_file, fs)
File "/fv3net/workflows/post_process_run/fv3post/append.py", line 282, in append_segment
append_segment(
File "/fv3net/workflows/prognostic_c48_run/runtime/segmented_run/append.py", line 80, in append_segment_to_run_url
sys.exit(api.append(url))
File "/fv3net/workflows/prognostic_c48_run/runtime/segmented_run/cli.py", line 57, in append
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 754, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1395, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1659, in invoke
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1053, in main
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1128, in __call__
load_entry_point('prognostic-run', 'console_scripts', 'runfv3')()
File "/usr/local/bin/runfv3", line 11, in <module>
Traceback (most recent call last):
return await func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/gcsfs/core.py", line 378, in _request
validate_response(status, contents, path, args)
File "/usr/local/lib/python3.8/dist-packages/gcsfs/retry.py", line 100, in validate_response
raise ValueError("Bad Request: %s\n%s" % (path, msg))
ValueError: Bad Request: https://storage.googleapis.com/download/storage/v1/b/vcm-ml-experiments/o/c384-ml%2F2022-07-18%2Fnn-seed-0-c384-run%2Ffv3gfs_run%2Fphysics_tendencies.zarr%2Ftendency_of_eastward_wind_due_to_fv3_physics%2F.zattrs?alt=media
User project specified in the request is invalid.
Error: exit status 1
The postprocessing of the prognostic run output has been failing several times for me recently when it tries to consolidate the metdata of an appended zarr from a segmented run. This leaves a recoverable zarr but with an incorrect consolidated metadata file, and obviously stops the job between segments with only some zarrs updated. Very roughly recently it seems like about 5-10% of the time a C384 segment's data are appended, this error appears. (It doesn't seem to happen for C48 runs as often.)
fs.cat
appears to be failing with an odd message ("User project specified in the request is invalid."). I wonder if the C384 runs are making too many gcsfs API calls.e.g.:
Run on f59deea
The text was updated successfully, but these errors were encountered: