Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop a cookiecutter template for virtualization #319

Open
4 tasks
maxrjones opened this issue Nov 25, 2024 · 0 comments
Open
4 tasks

Develop a cookiecutter template for virtualization #319

maxrjones opened this issue Nov 25, 2024 · 0 comments
Labels
usage example Real world use case examples

Comments

@maxrjones
Copy link
Member

Context

I think there would be value in building and sharing a cookiecutter template for virtualizing datasets, to incentivize open and accessible VirtualiZarr workflows. We could also use cruft to allow updating workflows for upstream changes.

There are shared steps between most virtualization workflows:

  • Generate a list of input files
  • Generate virtual datasets for each input file, with optional pre- or post-processing for this step
  • Concatenate virtual datasets into a single virtual dataset
  • Write the virtual dataset to a virtual Icechunk store or Kerchunk reference file
  • (Optional) apply the above workflow to multiple datasets
  • (Optional) generate a catalog (e.g., STAC) for multiple virtual datasets

There are many other boilerplate components:

  • Typing
  • Documentation
  • Licensing
  • CI/CD
  • Environment management

Lastly, there are parallelization, orchestration, and execution tools tools which could enhance virtualization workflows, with options including:

  • Dask
  • Flyte
  • Lithops
  • Modal
  • Coiled

This template would enable people to use best-practices and avoid spending time on boilerplate components.

Suggested task components

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage example Real world use case examples
Projects
None yet
Development

No branches or pull requests

1 participant