Rechunk Zarr-encoded data and run benchmarks #66

lewfish · 2022-06-15T16:56:45Z

Time the query using the Zarr-encoded data that is on S3.
(Not sure if we should do this) Time the query using the naive approach, ie. download NetCDF files and do the calculation. This will be slow because we are downloading huge amounts of irrelevant data. I think we want to do this as a baseline, but not sure.
Rechunk the data and save it on S3. Not sure, but I think doing it for the whole dataset is going to require a huge amount of resources. We should do it for a large, but manageable subset of the dataset. It needs to be large enough to make the benchmarks "realistic." This corresponds to Task 1-1: Encode NWM Gridded Output as Zarr #26.
Time the query using the rechunked Zarr-encoded data. This corresponds to Task 1-2: Benchmark HUC Query of NWM Gridded Zarr Output #27.

This is a piece of #45

lewfish self-assigned this Jun 15, 2022

lewfish mentioned this issue Jun 15, 2022

Experiments for ESIP Presentation #45

Closed

This was referenced Jul 5, 2022

Time sample query using Zarr-encoded data on S3 #46

Closed

Add notebooks for start of ESIP experiments #71

Merged

lewfish closed this as completed Aug 2, 2022

Provide feedback