Rechunk Zarr-encoded data and run benchmarks #114

lewfish · 2022-06-15T16:52:01Z

Time the query using the Zarr-encoded data that is on S3.
(Not sure if we should do this) Time the query using the naive approach, ie. download NetCDF files and do the calculation. This will be slow because we are downloading huge amounts of irrelevant data. I think we want to do this as a baseline, but not sure.
Rechunk the data and save it on S3. Not sure, but I think doing it for the whole dataset is going to require a huge amount of resources. We should do it for a large, but manageable subset of the dataset. It needs to be large enough to make the benchmarks "realistic." This corresponds to Task 1-1: Encode NWM Gridded Output as Zarr noaa-hydro-data#26.
Time the query using the rechunked Zarr-encoded data. This corresponds to Task 1-2: Benchmark HUC Query of NWM Gridded Zarr Output noaa-hydro-data#27.

lewfish · 2022-06-15T16:52:34Z

This is a piece of #45

lewfish · 2022-06-15T16:55:02Z

Closing as this is in the wrong repo

lewfish self-assigned this Jun 15, 2022

lewfish closed this as completed Jun 15, 2022

Provide feedback