-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sub-frame read/chunking #85
Comments
Hi, I think I really would like this enhancement to be available. I have several huge nd2 files (> 100 GB) consisting of large overview stitched images (7117 x 21044 pixels). Often I only need to extract small subvolumes of them. Looking at the files with NIS Elements is fairly fast for the size. But when I use
it definitely takes longer to load the data into RAM than NIS requires. The chunk size of the xarray is What skills are required to help on this issue? Unfortunately, I have zero experience with C/C++ ... But I could try my best with specific info on the task... |
Hey @gatoniel, Actually, it looks like this is already possible in the current nd2 library, but you need to do it in a specific way: There is a method in ND2File called import nd2
f = nd2.NDFile(...)
array = f.read_frame(0) that returned np.ndarray object is just a view onto a memmap object that knows the location on disk where the data resides, but nothing has been read from disk and loaded into memory yet). So, you can slice that object before reading which should help you save memory. crop = array[:, slicex, slicey]
# do what you want with the crop.
f.close() # reminder Currently, that won't work in the multi-dimensional xarray and dask wrappers (since the data underlying those is non-contiguous on disk, so it's harder to use a simple numpy view like this) ... so I guess that is the current "state" of this feature request. It can be done already on a single-plane level, using
btw, there's no more C code in this library! (since #135 🎉 ) |
I would like to try out the |
sure, you can use something like Lines 935 to 938 in eb4bf8f
so, for your example size coord_shape = (30, 5)
frame_coordinate = (5, 3) # t=5, p=3
sequence_index = np.ravel_multi_index(frame_coordinate, coord_shape) # would be 28 in this case
frame_array = f.read_frame(sequence_index) you can also use the private |
@tlambert03 thanks a lot. A preliminary test in my code shows that it is way faster that way!!! |
great! |
i'll update this issue when the dask interface also has chunking |
What exactly would be needed to make the dask interface able to use the chunking? Wouldn't it just need an intermediate function that relays to the |
that intermediate function already exists, it's called Lines 940 to 962 in eb4bf8f
so, what additionally needs to happen is:
Lines 957 to 959 in eb4bf8f
|
Hey @tlambert03 I have one more question when using
Do you have ideas on how to further speed that up? The first thing would be to allocate an empty array, instead of stacking everything afterwards. But is it better to directly read from the buffer with
? |
i do think using np.empty will be a slight improvement… but beyond that, I’m not really sure much can be done. Does it feel excessively slow? Or just curious?
You could play around with a multithreaded read? (but to be honest, I’m not sure how the single file handle to the nd2 file will work there). If you hit on anything, do let me know!
… On Dec 1, 2023, at 7:46 AM, niklas netter ***@***.***> wrote:
Hey @tlambert03 <https://github.com/tlambert03> I have one more question when using .read_frame() for timeseries. I want to extract the same xy crop for all timepoints, so I use the following:
timestack_ = []
for i in range(t_len):
frame_coordinate = (i, pos)
sequence_index = np.ravel_multi_index(frame_coordinate, coord_shape)
buffer = nd2_file.read_frame(sequence_index)
timestack_.append(buffer[1, yslice, xslice])
timestack = np.stack(timestack_, axis=0)
Do you have ideas on how to further speed that up? The first thing would be to allocate an empty array, instead of stacking everything afterwards. But is it better to directly read from the buffer with np.copy or just assign the view to the empty array like so
timestack = np.empty((t_len, ylen, xlen))
for i in range(t_len):
frame_coordinate = (i, pos)
sequence_index = np.ravel_multi_index(frame_coordinate, coord_shape)
buffer = nd2_file.read_frame(sequence_index)
timestack[i, ...] = buffer[1, yslice, xslice]
```?
Or is there another trick that could speed this up?
best,
Niklas
—
Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMI52LKTO6GP6RGJTBXQYLYHHNULAVCNFSM56R576U2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTGYYTINRRGQ2Q>.
You are receiving this because you were mentioned.
|
Not excessively slow. I was curious, mainly. |
in the reader, at around
nd2/src/nd2/_sdk/latest.pyx
Line 280 in 56cde51
we should implement subframe cropping.
The text was updated successfully, but these errors were encountered: