Implement sub-frame read/chunking #85

tlambert03 · 2022-08-15T11:28:53Z

in the reader, at around

Line 280 in 56cde51

cpdef np.ndarray _read_image_from_memmap(self, index: int):

we should implement subframe cropping.

gatoniel · 2023-11-21T13:43:27Z

Hi, I think I really would like this enhancement to be available. I have several huge nd2 files (> 100 GB) consisting of large overview stitched images (7117 x 21044 pixels). Often I only need to extract small subvolumes of them. Looking at the files with NIS Elements is fairly fast for the size. But when I use xarray with

crop = xa_nd2[:, slicex, slicey].compute()

it definitely takes longer to load the data into RAM than NIS requires. The chunk size of the xarray is (1, 21044, 7117). Do you think implementing the sub-frame read/chunking would help with that, eg, by reducing the chunk size?

What skills are required to help on this issue? Unfortunately, I have zero experience with C/C++ ... But I could try my best with specific info on the task...

tlambert03 · 2023-11-24T17:03:41Z

Hey @gatoniel,

Actually, it looks like this is already possible in the current nd2 library, but you need to do it in a specific way:

There is a method in ND2File called .read_frame(). You call it with a single integer representing the frame number in the file. Here, frame number means the index of the 2D plane (possibly multichannel) acquired over the course of the experiment (let me know if you need help determining the frame numbers in your files). But lets take the simple case of a single 2d tiled image:

import nd2

f = nd2.NDFile(...)
array = f.read_frame(0)

that returned np.ndarray object is just a view onto a memmap object that knows the location on disk where the data resides, but nothing has been read from disk and loaded into memory yet). So, you can slice that object before reading which should help you save memory.

crop = array[:, slicex, slicey]
# do what you want with the crop.

f.close()  # reminder

Currently, that won't work in the multi-dimensional xarray and dask wrappers (since the data underlying those is non-contiguous on disk, so it's harder to use a simple numpy view like this) ... so I guess that is the current "state" of this feature request. It can be done already on a single-plane level, using read_frame, but if you want to do more complex slicing that includes X/Y and other dimensions, more work is needed. Let me know if that's your situation or if the read_frame approach would work for you for now

Unfortunately, I have zero experience with C/C++

btw, there's no more C code in this library! (since #135 🎉 )

gatoniel · 2023-11-27T16:35:09Z

I would like to try out the .read_frame(). Could you help me in determining the correct frame numbers? nd2_file.sizes gives {'T': 30, 'P': 5, 'C': 2, 'Y': 21044, 'X': 7117}.

tlambert03 · 2023-11-27T16:45:02Z

sure, you can use something like np.ravel_multi_index. There is a private method that does this here:

nd2/src/nd2/nd2file.py

Lines 935 to 938 in eb4bf8f

    
           def _seq_index_from_coords(self, coords: Sequence) -> Sequence[int] | SupportsInt: 
        
               if not self._coord_shape: 
        
                   return self._NO_IDX 
        
               return np.ravel_multi_index(coords, self._coord_shape)  # type: ignore

so, for your example size {'T': 30, 'P': 5, 'C': 2, 'Y': 21044, 'X': 7117}, your coordinate shape would be (30, 5) (i.e. all coordinates that are not C Y or X). So, if you wanted to get the sequence index for timepoint=5 position=3 (both zero indexed), then you would use:

coord_shape = (30, 5)
frame_coordinate = (5, 3)  # t=5, p=3 
sequence_index = np.ravel_multi_index(frame_coordinate,  coord_shape)  # would be 28 in this case
frame_array = f.read_frame(sequence_index)

you can also use the private f._seq_index_from_coords if you like, but it may break in the future.

gatoniel · 2023-11-27T18:12:05Z

@tlambert03 thanks a lot. A preliminary test in my code shows that it is way faster that way!!!

tlambert03 · 2023-11-27T18:13:06Z

great!

tlambert03 · 2023-11-27T18:15:20Z

i'll update this issue when the dask interface also has chunking

gatoniel · 2023-11-27T18:34:12Z

What exactly would be needed to make the dask interface able to use the chunking? Wouldn't it just need an intermediate function that relays to the .read_frame() function?

tlambert03 · 2023-11-27T18:52:26Z

that intermediate function already exists, it's called _dask_block:

nd2/src/nd2/nd2file.py

Lines 940 to 962 in eb4bf8f

    
           def _dask_block(self, copy: bool, block_id: tuple[int]) -> np.ndarray: 
        
               if isinstance(block_id, np.ndarray): 
        
                   return None 
        
               with self._lock: 
        
                   was_closed = self.closed 
        
                   if self.closed: 
        
                       self.open() 
        
                   try: 
        
                       ncoords = len(self._coord_shape) 
        
                       idx = self._seq_index_from_coords(block_id[:ncoords]) 
        
                       if idx == self._NO_IDX: 
        
                           if any(block_id):  # pragma: no cover 
        
                               raise ValueError( 
        
                                   f"Cannot get chunk {block_id} for single frame image." 
        
                               ) 
        
                           idx = 0 
        
                       data = self.read_frame(int(idx))  # type: ignore 
        
                       data = data.copy() if copy else data 
        
                       return data[(np.newaxis,) * ncoords] 
        
                   finally: 
        
                       if was_closed: 
        
                           self.close()

so, what additionally needs to happen is:

the to_dask method needs to accept a frame_chunks parameter that lets the user define how they would like each CYX plane to be chunked. It would modify this line
the _dask_block function needs to add a block_info keyword argument as described in the dask map_blocks docs.
the array-location key in the block_info dict would then need to be used to slice the buffer returned from read_frame here, before the copy operation:

nd2/src/nd2/nd2file.py

Lines 957 to 959 in eb4bf8f

    
           data = self.read_frame(int(idx))  # type: ignore 
        
           data = data.copy() if copy else data 
        
           return data[(np.newaxis,) * ncoords]

gatoniel · 2023-12-01T13:46:35Z

Hey @tlambert03 I have one more question when using .read_frame() for timeseries. I want to extract the same xy crop for all timepoints, so I use the following:

timestack_ = []
for i in range(t_len):
    frame_coordinate = (i, pos)
    sequence_index = np.ravel_multi_index(frame_coordinate, coord_shape)
    buffer = nd2_file.read_frame(sequence_index)
    timestack_.append(buffer[1, yslice, xslice])
timestack = np.stack(timestack_, axis=0)

Do you have ideas on how to further speed that up? The first thing would be to allocate an empty array, instead of stacking everything afterwards. But is it better to directly read from the buffer with np.copy or just assign the view to the empty array like so

timestack = np.empty((t_len, ylen, xlen))
for i in range(t_len):
    frame_coordinate = (i, pos)
    sequence_index = np.ravel_multi_index(frame_coordinate, coord_shape)
    buffer = nd2_file.read_frame(sequence_index)
    timestack[i, ...] = buffer[1, yslice, xslice]

?
Or is there another trick that could speed this up?
best,
Niklas

tlambert03 · 2023-12-01T14:48:09Z

i do think using np.empty will be a slight improvement… but beyond that, I’m not really sure much can be done. Does it feel excessively slow? Or just curious? You could play around with a multithreaded read? (but to be honest, I’m not sure how the single file handle to the nd2 file will work there). If you hit on anything, do let me know!

…

On Dec 1, 2023, at 7:46 AM, niklas netter ***@***.***> wrote: Hey @tlambert03 <https://github.com/tlambert03> I have one more question when using .read_frame() for timeseries. I want to extract the same xy crop for all timepoints, so I use the following: timestack_ = [] for i in range(t_len): frame_coordinate = (i, pos) sequence_index = np.ravel_multi_index(frame_coordinate, coord_shape) buffer = nd2_file.read_frame(sequence_index) timestack_.append(buffer[1, yslice, xslice]) timestack = np.stack(timestack_, axis=0) Do you have ideas on how to further speed that up? The first thing would be to allocate an empty array, instead of stacking everything afterwards. But is it better to directly read from the buffer with np.copy or just assign the view to the empty array like so timestack = np.empty((t_len, ylen, xlen)) for i in range(t_len): frame_coordinate = (i, pos) sequence_index = np.ravel_multi_index(frame_coordinate, coord_shape) buffer = nd2_file.read_frame(sequence_index) timestack[i, ...] = buffer[1, yslice, xslice] ```? Or is there another trick that could speed this up? best, Niklas — Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMI52LKTO6GP6RGJTBXQYLYHHNULAVCNFSM56R576U2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTGYYTINRRGQ2Q>. You are receiving this because you were mentioned.

gatoniel · 2023-12-01T15:00:34Z

Not excessively slow. I was curious, mainly.

tlambert03 added enhancement New feature or request help wanted Extra attention is needed labels Aug 15, 2022

tlambert03 mentioned this issue Sep 13, 2022

ValueError for the big 2d image in the get_frame method #83

Closed

tlambert03 mentioned this issue Nov 28, 2022

Memory usage #17

Open

gatoniel linked a pull request Sep 27, 2024 that will close this issue

feat: dask chunking on frame level implemented #242

Open

tlambert03 linked a pull request Sep 27, 2024 that will close this issue

feat: dask chunking on frame level implemented #242

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sub-frame read/chunking #85

Implement sub-frame read/chunking #85

tlambert03 commented Aug 15, 2022

gatoniel commented Nov 21, 2023

tlambert03 commented Nov 24, 2023 •

edited

Loading

gatoniel commented Nov 27, 2023

tlambert03 commented Nov 27, 2023 •

edited

Loading

gatoniel commented Nov 27, 2023

tlambert03 commented Nov 27, 2023

tlambert03 commented Nov 27, 2023

gatoniel commented Nov 27, 2023

tlambert03 commented Nov 27, 2023

gatoniel commented Dec 1, 2023 •

edited

Loading

tlambert03 commented Dec 1, 2023 via email

gatoniel commented Dec 1, 2023

Implement sub-frame read/chunking #85

Implement sub-frame read/chunking #85

Comments

tlambert03 commented Aug 15, 2022

gatoniel commented Nov 21, 2023

tlambert03 commented Nov 24, 2023 • edited Loading

gatoniel commented Nov 27, 2023

tlambert03 commented Nov 27, 2023 • edited Loading

gatoniel commented Nov 27, 2023

tlambert03 commented Nov 27, 2023

tlambert03 commented Nov 27, 2023

gatoniel commented Nov 27, 2023

tlambert03 commented Nov 27, 2023

gatoniel commented Dec 1, 2023 • edited Loading

tlambert03 commented Dec 1, 2023 via email

gatoniel commented Dec 1, 2023

tlambert03 commented Nov 24, 2023 •

edited

Loading

tlambert03 commented Nov 27, 2023 •

edited

Loading

gatoniel commented Dec 1, 2023 •

edited

Loading