-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImageArray Initialization #261
Conversation
) | ||
def __init__(self, zarray: zarr.Array = None, **kwargs): | ||
if zarray is not None: | ||
kwargs.update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kwargs.update( | |
# Dynamically prepare kwargs for the base class to support Dask | |
kwargs.update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I would only add a sentence in the dosctring or comment why we added this (i.e supporting/compatibility with dask).
meta_array=zarray._meta_array, | ||
) | ||
def __init__(self, zarray: zarr.Array = None, **kwargs): | ||
if zarray is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain when this method should be called without a argument? It is an internal wrapper, and if you don't need the wrapper, just doing Position.zgroup["0"]
would give you the plain zarr.Array
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dask may expect exactly zarr.Array
and not a wrapped subclass for some of their internal logic. I think it's safer to match their advertised API than hacking our own wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an example use case at the top and a short docstring to the __init__
function:
def __init__(self, zarray: zarr.Array = None, **kwargs):
"""Keyword arguments are passed to the zarr.Array constructor.
If a zarr.Array is provided, the constructor will use its attributes
to initialize the ImageArray.
"""
I think this is a common enough use case, such that we should make iohub arrays work with the dask arrays in multiprocessing applications
Another thought on API design: if the goal is to get a convenient handle of a dask array out of an OME-Zarr array, would it be better to instead add an explicit constructor like we did for |
ImageArray
is designed to be initialized with a zarr array. This causes problems when using ImageArrays in parallel compute using dask, where arrays are initialized using key-value arguments, as in https://zarr.readthedocs.io/en/stable/_modules/zarr/core.html#Array. Here is a minimal exampleThis example will fail, likely because of the way dask initializes the zarr arrays in the workers. In this PR, I make the zarray argument optional and allow for additional keyword arguments.
Fixed #226