Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should memory ordering be specified #31

Open
aschampion opened this issue Mar 10, 2018 · 5 comments
Open

Should memory ordering be specified #31

aschampion opened this issue Mar 10, 2018 · 5 comments

Comments

@aschampion
Copy link
Contributor

I didn't notice any place in the docs that specifies memory order of stored blocks, but all of the N5 datasets I've encountered in the wild (other than ones I initially naively generated) are f-order/column-major. This can be a bit surprising since the directory hierarchy is, in a sense, c-order/row-major.

It's also disparate to HDF5 which IIRC is row-major.

Is the intent to leave it unspecified? If so, it would at least make sense to make it a standard attribute so libs that want to unpack the serialized data to an n-dim array can do so appropriately.

@axtimwalde
Copy link
Collaborator

axtimwalde commented Mar 10, 2018 via email

@aschampion
Copy link
Contributor Author

Confused on both these points. Please bear with my post-running denseness:

directories and block API are column major

Coords [a, b, c, d] are stored a/b/c/d; d, last dimension, is changing "fastest" (granted a directory tree is not the same thing as contiguous memory). That's row-major.

Block data is stored row major

Will double check tomorrow, but when rendering out of FAFB N5 yesterday it's listed [x, y, z] and x is the most rapidly changing dim, i.e., column-major.

Also from z5 docs:

Internally, n5 uses column-major (i.e. x, y, z) axis ordering

(I want to refrain from referring to these as xyz vs zyx, since that's orthogonal and muddies the water.)

@axtimwalde
Copy link
Collaborator

axtimwalde commented Mar 11, 2018

If you consequently exchange row and column major, everything works but your output will be transposed which is perfectly fine if you stick to it.

Coords [a, b, c, d] are stored a/b/c/d; d, last dimension, is changing "fastest" (granted a directory tree is not the same thing as contiguous memory). That's row-major.

Depends on what you mean with fastest changing which is often confused. In row-major notation, a runs fastest, i.e. in contiguous memory layout, pointer distances between adjacent elements along a are largest, d would be the slowest running dimension, i.e. elements adjacent in d are adjacent in memory.
My APIs address vectors in column major order, i.e. elements adjacent int the first dimension are adjacent. Since this does not have an obvious meaning in a random access filesystem, I stuck with the same convention, which means that neighboring elements that are directly adjacent in a data block have their adjacent elements in another data block NOT in the same directory 0/b/c/d 1/b/c/d.
Back to confusing but sometimes helpful mental pictures: take an ssTEM stack with planar resolution of 4nm/px and axial resolution of 40nm/px: In my API, the planar dimensions are first, like in math text books and unlike in numpy or hdf5, where they are last.

@aschampion
Copy link
Contributor Author

aschampion commented Mar 11, 2018

If you consequently exchange row and column major, everything works but your output will be transposed which is perfectly fine if you stick to it.

Right, but one doesn't want to accidentally stitch transposed blocks next to one another, which is how I discovered the block data wasn't row-major like I was expecting given the relation to HDF5.

Depends on what you mean with fastest changing which is often confused.

Standard thing, in any contiguous memory region > 3 elements the dimension for which elements' coordinates assumes the most distinct values. It's just common to say "last/first axis fastest changing" rather than "first/last axis contiguous" for generalizing orderings to strided cases, but one could just as well say "first/last axis most contiguous". We don't seem to disagree about this or your definition of row-major here.

Back to confusing but sometimes helpful mental pictures: take an ssTEM stack with planar resolution of 4nm/px and axial resolution of 40nm/px: In my API, the planar dimensions are first, like in math text books and unlike in numpy or hdf5, where they are last.

Right, when storing anisotropic data in column-major structures we put the planar dims in the first/lower index axes to have parity between least physical distance and memory contiguity for the sake locality/performance. Again, no disagreement or confusion.

Since this does not have an obvious meaning in a random access filesystem, I stuck with the same convention, which means that neighboring elements that are directly adjacent in a data block have their adjacent elements in another data block NOT in the same directory 0/b/c/d 1/b/c/d.

I would still call this row-major, because 0/0/0/0<->0/0/0/1 has a smaller tree/inode path distance, i.e., analogously "least contiguous" (or "faster changing" in an inorder traversal), than 0/0/0/0<->1/0/0/0. Last dimension index is fastest changing == row-major. However the directory ordering wasn't my main concern for creating this issue, because it's in the spec.

My concern was the serialization ordering within each block, which in existing N5 data I looked at was axis 0 is fastest changing/axis 3 most contiguous, meaning column-major. That's fine, it was just surprising since it's less common and I thought it should be documented.

My confusion was your statement:

It's the opposite. Block data is stored row major

But it's the weekend! Sorry for the noise, will catch you on Monday.

e: remove equivocal use of contiguous/changing

@axtimwalde
Copy link
Collaborator

The n5-zarr backend implements C-order and F-order layout of vectors passed into the API analog to how it is used in numpy. According to this logic, N5 is F-order and I am not sure if I want to add this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants