Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rechunk Zarr-encoded data and run benchmarks #114

Closed
lewfish opened this issue Jun 15, 2022 · 2 comments
Closed

Rechunk Zarr-encoded data and run benchmarks #114

lewfish opened this issue Jun 15, 2022 · 2 comments
Assignees

Comments

@lewfish
Copy link

lewfish commented Jun 15, 2022

  • Time the query using the Zarr-encoded data that is on S3.
  • (Not sure if we should do this) Time the query using the naive approach, ie. download NetCDF files and do the calculation. This will be slow because we are downloading huge amounts of irrelevant data. I think we want to do this as a baseline, but not sure.
  • Rechunk the data and save it on S3. Not sure, but I think doing it for the whole dataset is going to require a huge amount of resources. We should do it for a large, but manageable subset of the dataset. It needs to be large enough to make the benchmarks "realistic." This corresponds to Task 1-1: Encode NWM Gridded Output as Zarr noaa-hydro-data#26.
  • Time the query using the rechunked Zarr-encoded data. This corresponds to Task 1-2: Benchmark HUC Query of NWM Gridded Zarr Output noaa-hydro-data#27.
@lewfish
Copy link
Author

lewfish commented Jun 15, 2022

This is a piece of #45

@lewfish lewfish self-assigned this Jun 15, 2022
@lewfish
Copy link
Author

lewfish commented Jun 15, 2022

Closing as this is in the wrong repo

@lewfish lewfish closed this as completed Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant