Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: store learns to delete prefixes when overwriting/creating hierarchy nodes #2430

Merged
merged 19 commits into from
Nov 5, 2024

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Oct 22, 2024

This PR implements delete_dir teaches the Group and Array classes to use it when overwriting nodes in a hierarchy.

Also included here:
- delete_prefix - not used but perhaps we should just use it instead of delete_dir

  • changes list_prefix to return absolute paths instead of stripping prefix. Given that list_prefix was unused before, this wasn't very disruptive but calling it out so others can help decide if this is what we want.

Still needs:

  • new store tests for delete_dir
  • new group tests for edge cases around delete_dir

closes #2191
first steps toward #2108 and #2359

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@jhamman jhamman marked this pull request as ready for review October 22, 2024 13:44
src/zarr/abc/store.py Outdated Show resolved Hide resolved
src/zarr/api/asynchronous.py Outdated Show resolved Hide resolved
src/zarr/core/array.py Outdated Show resolved Hide resolved
src/zarr/storage/local.py Outdated Show resolved Hide resolved
src/zarr/abc/store.py Outdated Show resolved Hide resolved
@@ -371,6 +371,36 @@ def list_dir(self, prefix: str) -> AsyncGenerator[str, None]:
"""
...

async def delete_dir(self, prefix: str, recursive: bool = True) -> None:
"""
Remove all keys and prefixes in the store that begin with a given prefix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make life much easier for Store implementers if we declare that prefix will always be the path of a group or a array. Otherwise, all implementations need to deal with unnecessary edge cases like delete_dir("path/to/array/c/0") or worse.

Icechunk is an example of a store that could very cleanly and efficiently implement the case for a group/array prefix, but it would struggle to handle other prefixes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for recursive = False ? If zarr itself doesn't have one, I'd remove the argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a use case for delete_dir("path/to/array/c/") (remove all chunks from an array) so I think we should allow that -- even if it makes life a little harder in icechunk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's totally fine. Let's document the three possible prefix cases, and say that stores are allowed to do undefined behavior for other prefixes

src/zarr/abc/store.py Outdated Show resolved Hide resolved
if not self.supports_listing:
raise NotImplementedError
self._check_writable()
async for key in self.list_prefix(prefix):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document what's the expected return value of list_prefix? Absolute vs. relative, final "/" or not, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done in the list_prefix docstring

@TomAugspurger
Copy link
Contributor

Looks good overall. I've been struggling a bit with whether Stores should be a bit more aware about the context the operation is occurring in (e.g. deleting a group vs a specific chunk of an array; I think this relates to the conversation at #2430 (comment)). But I think this is a good improvement on its own.

@jhamman
Copy link
Member Author

jhamman commented Oct 28, 2024

@TomAugspurger / @d-v-b / @paraseba - any final comments here. I'd like to get this in ahead of the larger store mode refactor in #2442

Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. regarding the change to list_prefix, I will note that the definition of "absolute paths" is a bit shaky across different stores, since RemoteStore will return keys relative to its path attribute, and similarly LocalStore with its root attribute.

@jhamman
Copy link
Member Author

jhamman commented Oct 28, 2024

regarding the change to list_prefix, I will note that the definition of "absolute paths" is a bit shaky across different stores

Would defining these as "relative to the root of the store" be clearer?

src/zarr/abc/store.py Outdated Show resolved Hide resolved
@jhamman
Copy link
Member Author

jhamman commented Nov 5, 2024

pre-commit.ci autofix

@jhamman jhamman closed this Nov 5, 2024
@jhamman jhamman reopened this Nov 5, 2024
@jhamman jhamman merged commit a31046c into zarr-developers:main Nov 5, 2024
24 checks passed
@jhamman jhamman deleted the feature/store_erase_prefix branch November 5, 2024 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v3] deleting a group does not delete subgroups
4 participants