Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix iterating over sharding index #2392

Merged

Conversation

brokkoli71
Copy link
Member

For some specific shapes of chunks_per_shard (e.g. [5,2]), the current implementation of morton code will produce indices that are not within the shape. Therefore reading the sharding index will drop some chunks.
I was wondering if using morton code is a significant efficiency improvement over normal indexing like np.unravel_index

@d-v-b
Copy link
Contributor

d-v-b commented Oct 16, 2024

For some specific shapes of chunks_per_shard (e.g. [5,2]), the current implementation of morton code will produce indices that are not within the shape.

Is this a bug, or something fundamental to morton coding?

@brokkoli71
Copy link
Member Author

brokkoli71 commented Oct 16, 2024

@d-v-b
As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.

I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like np.unravel_index.

@d-v-b
Copy link
Contributor

d-v-b commented Oct 16, 2024

@d-v-b As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.

I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like np.unravel_index.

I thought the use of the morton ordering was motivated by the desire to place spatially contiguous chunks close together in a shard. The simpler linearization algorithms will not allow this. Maybe it's better to fix the bug in the morton encoding?

@brokkoli71 brokkoli71 marked this pull request as ready for review October 22, 2024 09:57
@brokkoli71
Copy link
Member Author

@d-v-b
The morton encoding was actually right.
The problem was that morton ordering by is designed to fill arrays with power-of-two shapes. Therefore, it would sometimes return a coordinate outside the array. I implemented that these coordinates are skipped so that most of the time the morton ordering is maintained and only at the edge a few jumps can occur.

@normanrz normanrz self-requested a review November 29, 2024 15:38
@normanrz normanrz merged commit cdd6a74 into zarr-developers:main Nov 29, 2024
26 checks passed
@normanrz normanrz deleted the fix-iterating-over-sharding-index branch November 29, 2024 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants