Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AsyncArray.nchunks_initialized async #2449

Merged

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Oct 30, 2024

This changes the API of AysncArray.nchunks_initialized to change it from a property to an async function. The motivation here comes from

  1. general cleanliness (a property access calling async functions doing I/O feels a bit wrong)
  2. Work on Array.info (Added info for Group and Array #2400 ), where I hit a strange error:
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 3011, in info_complete
    return sync(self._async_array.info_complete())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/sync.py", line 141, in sync
    raise return_result
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/sync.py", line 100, in _runner
    return await coro
           ^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 1223, in info_complete
    "count_chunks_initialized": self.nchunks_initialized,  # this should be async?
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 844, in nchunks_initialized
    return nchunks_initialized(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 3035, in nchunks_initialized
    return len(chunks_initialized(array))
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 3061, in chunks_initialized
    collect_aiterator(array.store_path.store.list_prefix(prefix=array.store_path.path))
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/sync.py", line 178, in collect_aiterator
    return sync(_collect_aiterator(data))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/sync.py", line 128, in sync
    raise SyncError("Calling sync() from within a running loop")
zarr.core.sync.SyncError: Calling sync() from within a running loop

I think the error came from jumping between sync and async multiple times:

  • sync Array.info_complete ->
  • async AsyncArray.info_complete ->
  • sync AsyncArray.nchunks_initialzed ->
  • sync collect_aiterator (async list_prefix)

With this change, we'll be able to jump from sync to async just once
at the boundary. Maybe sync should be able to do that if needed, but in this case we can avoid it.

The big downside: this makes Array.nchunks_initialized different from AsyncArray.nchunks_initialized. IMO that's OK. Array.nchunks_initialized is bound by backwards compatibility, but that doesn't have to lock AsyncArray into the same API. Even stuff like resize are different since the async version needs to be awaited. This is a bit more different since you need to call it and await it, but maybe that's OK for the clarity?

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

This changes the API of AysncArray.nchunks_initialized to
change it from a property to an async function. The motivation
here comes from

1. general cleanliness (a property access calling async functions doing
   I/O feels a bit wrong)
2. Work on Array.info, where I hit a strange error, I think from jumping
   from a

   - sync Array.info_complete ->
   - async AsyncArray.info_complete ->
   - sync AsyncArray.nchunks_initialzed ->
   - sync collect_aiterator (async list_prefix)

   With this change, we'll be able to jump from sync to async just once
   at the boundary.

```
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 3011, in info_complete
    return sync(self._async_array.info_complete())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/sync.py", line 141, in sync
    raise return_result
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/sync.py", line 100, in _runner
    return await coro
           ^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 1223, in info_complete
    "count_chunks_initialized": self.nchunks_initialized,  # this should be async?
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 844, in nchunks_initialized
    return nchunks_initialized(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 3035, in nchunks_initialized
    return len(chunks_initialized(array))
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/array.py", line 3061, in chunks_initialized
    collect_aiterator(array.store_path.store.list_prefix(prefix=array.store_path.path))
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/sync.py", line 178, in collect_aiterator
    return sync(_collect_aiterator(data))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tom/gh/zarr-developers/zarr-python/src/zarr/core/sync.py", line 128, in sync
    raise SyncError("Calling sync() from within a running loop")
zarr.core.sync.SyncError: Calling sync() from within a running loop
```
@d-v-b
Copy link
Contributor

d-v-b commented Oct 30, 2024

The big downside: this makes Array.nchunks_initialized different from AsyncArray.nchunks_initialized. IMO that's OK.

It's also OK from my POV. This looks like a good change to me!

@@ -366,9 +366,9 @@ def test_chunks_initialized(test_cls: type[Array] | type[AsyncArray[Any]]) -> No
arr[region] = 1

if test_cls == Array:
observed = sorted(chunks_initialized(arr))
observed = sorted(await chunks_initialized(arr)) # Why doesn't mypy error here?
Copy link
Contributor Author

@TomAugspurger TomAugspurger Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good, this failed in CI: https://results.pre-commit.ci/run/github/48049137/1730299519.ojFfXsB_QvipnKxxQtQKgA

And it fails for me locally if I run it on --all-files (but it didn't fail when checking the diff, part of why I don't like putting mypy in pre-commit).

I'll get this fixed up. AFAICT, chunks_initialized isn't part of the public API so it's safe to make it async only. If it's useful, maybe it could be added as a method on Array / AsyncArray.

@TomAugspurger TomAugspurger marked this pull request as ready for review October 30, 2024 15:18
@d-v-b d-v-b merged commit a82d047 into zarr-developers:main Nov 4, 2024
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants