Should memory ordering be specified #31

aschampion · 2018-03-10T16:26:54Z

I didn't notice any place in the docs that specifies memory order of stored blocks, but all of the N5 datasets I've encountered in the wild (other than ones I initially naively generated) are f-order/column-major. This can be a bit surprising since the directory hierarchy is, in a sense, c-order/row-major.

It's also disparate to HDF5 which IIRC is row-major.

Is the intent to leave it unspecified? If so, it would at least make sense to make it a standard attribute so libs that want to unpack the serialized data to an n-dim array can do so appropriately.

axtimwalde · 2018-03-10T22:05:50Z

It's the opposite. Block data is stored row major, i.e. c-order, directories and block API are column major. Reason... That's how we access n-d data in ImgLib2. There is no strict relation between n-d vector order and how this is possibly translated into a continuous index. This is ours. Iwill mention this in the spec.

…

On March 10, 2018 11:26:54 AM EST, Andrew Champion ***@***.***> wrote: I didn't notice any place in the docs that specifies memory order of stored blocks, but all of the N5 datasets I've encountered in the wild (other than ones I initially naively generated) are f-order/column-major. This can be a bit surprising since the directory hierarchy is, in a sense, c-order/row-major. It's also disparate to HDF5 which IIRC is row-major. Is the intent to leave it unspecified? If so, it would at least make sense to make it a standard attribute so libs that want to unpack the serialized data to an n-dim array can do so appropriately. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: #31

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

aschampion · 2018-03-10T22:26:09Z

Confused on both these points. Please bear with my post-running denseness:

directories and block API are column major

Coords [a, b, c, d] are stored a/b/c/d; d, last dimension, is changing "fastest" (granted a directory tree is not the same thing as contiguous memory). That's row-major.

Block data is stored row major

Will double check tomorrow, but when rendering out of FAFB N5 yesterday it's listed [x, y, z] and x is the most rapidly changing dim, i.e., column-major.

Also from z5 docs:

Internally, n5 uses column-major (i.e. x, y, z) axis ordering

(I want to refrain from referring to these as xyz vs zyx, since that's orthogonal and muddies the water.)

axtimwalde · 2018-03-11T00:09:01Z

If you consequently exchange row and column major, everything works but your output will be transposed which is perfectly fine if you stick to it.

Coords [a, b, c, d] are stored a/b/c/d; d, last dimension, is changing "fastest" (granted a directory tree is not the same thing as contiguous memory). That's row-major.

Depends on what you mean with fastest changing which is often confused. In row-major notation, a runs fastest, i.e. in contiguous memory layout, pointer distances between adjacent elements along a are largest, d would be the slowest running dimension, i.e. elements adjacent in d are adjacent in memory.
My APIs address vectors in column major order, i.e. elements adjacent int the first dimension are adjacent. Since this does not have an obvious meaning in a random access filesystem, I stuck with the same convention, which means that neighboring elements that are directly adjacent in a data block have their adjacent elements in another data block NOT in the same directory 0/b/c/d 1/b/c/d.
Back to confusing but sometimes helpful mental pictures: take an ssTEM stack with planar resolution of 4nm/px and axial resolution of 40nm/px: In my API, the planar dimensions are first, like in math text books and unlike in numpy or hdf5, where they are last.

aschampion · 2018-03-11T01:31:05Z

If you consequently exchange row and column major, everything works but your output will be transposed which is perfectly fine if you stick to it.

Right, but one doesn't want to accidentally stitch transposed blocks next to one another, which is how I discovered the block data wasn't row-major like I was expecting given the relation to HDF5.

Depends on what you mean with fastest changing which is often confused.

Standard thing, in any contiguous memory region > 3 elements the dimension for which elements' coordinates assumes the most distinct values. It's just common to say "last/first axis fastest changing" rather than "first/last axis contiguous" for generalizing orderings to strided cases, but one could just as well say "first/last axis most contiguous". We don't seem to disagree about this or your definition of row-major here.

Back to confusing but sometimes helpful mental pictures: take an ssTEM stack with planar resolution of 4nm/px and axial resolution of 40nm/px: In my API, the planar dimensions are first, like in math text books and unlike in numpy or hdf5, where they are last.

Right, when storing anisotropic data in column-major structures we put the planar dims in the first/lower index axes to have parity between least physical distance and memory contiguity for the sake locality/performance. Again, no disagreement or confusion.

Since this does not have an obvious meaning in a random access filesystem, I stuck with the same convention, which means that neighboring elements that are directly adjacent in a data block have their adjacent elements in another data block NOT in the same directory 0/b/c/d 1/b/c/d.

I would still call this row-major, because 0/0/0/0<->0/0/0/1 has a smaller tree/inode path distance, i.e., analogously "least contiguous" (or "faster changing" in an inorder traversal), than 0/0/0/0<->1/0/0/0. Last dimension index is fastest changing == row-major. However the directory ordering wasn't my main concern for creating this issue, because it's in the spec.

My concern was the serialization ordering within each block, which in existing N5 data I looked at was axis 0 is fastest changing/axis 3 most contiguous, meaning column-major. That's fine, it was just surprising since it's less common and I thought it should be documented.

My confusion was your statement:

It's the opposite. Block data is stored row major

But it's the weekend! Sorry for the noise, will catch you on Monday.

e: remove equivocal use of contiguous/changing

axtimwalde · 2020-01-02T19:10:58Z

The n5-zarr backend implements C-order and F-order layout of vectors passed into the API analog to how it is used in numpy. According to this logic, N5 is F-order and I am not sure if I want to add this.

aschampion mentioned this issue Jan 29, 2019

Image block layer catmaid/CATMAID#1827

Merged

8 tasks

axtimwalde added the enhancement label Jan 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should memory ordering be specified #31

Should memory ordering be specified #31

aschampion commented Mar 10, 2018

axtimwalde commented Mar 10, 2018 via email

aschampion commented Mar 10, 2018

axtimwalde commented Mar 11, 2018 •

edited

Loading

aschampion commented Mar 11, 2018 •

edited

Loading

axtimwalde commented Jan 2, 2020

Should memory ordering be specified #31

Should memory ordering be specified #31

Comments

aschampion commented Mar 10, 2018

axtimwalde commented Mar 10, 2018 via email

aschampion commented Mar 10, 2018

axtimwalde commented Mar 11, 2018 • edited Loading

aschampion commented Mar 11, 2018 • edited Loading

axtimwalde commented Jan 2, 2020

axtimwalde commented Mar 11, 2018 •

edited

Loading

aschampion commented Mar 11, 2018 •

edited

Loading