Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rechunk Zarr-encoded data and run benchmarks #66

Closed
lewfish opened this issue Jun 15, 2022 · 0 comments
Closed

Rechunk Zarr-encoded data and run benchmarks #66

lewfish opened this issue Jun 15, 2022 · 0 comments
Assignees

Comments

@lewfish
Copy link
Contributor

lewfish commented Jun 15, 2022

  • Time the query using the Zarr-encoded data that is on S3.
  • (Not sure if we should do this) Time the query using the naive approach, ie. download NetCDF files and do the calculation. This will be slow because we are downloading huge amounts of irrelevant data. I think we want to do this as a baseline, but not sure.
  • Rechunk the data and save it on S3. Not sure, but I think doing it for the whole dataset is going to require a huge amount of resources. We should do it for a large, but manageable subset of the dataset. It needs to be large enough to make the benchmarks "realistic." This corresponds to Task 1-1: Encode NWM Gridded Output as Zarr #26.
  • Time the query using the rechunked Zarr-encoded data. This corresponds to Task 1-2: Benchmark HUC Query of NWM Gridded Zarr Output #27.

This is a piece of #45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant