-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve data loader performance #565
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #565 +/- ##
==========================================
- Coverage 92.53% 92.52% -0.02%
==========================================
Files 43 42 -1
Lines 6003 6008 +5
==========================================
+ Hits 5555 5559 +4
- Misses 448 449 +1
|
Super cool analysis! I'll also try it out (which commands did you use to open If most of the time is spent outside But my bet (I need to check by running the profiler), is that the problem is that we load multiple times the same chunks. I think that maybe using This second approach has the advantage that it involves only the dataloader class and does not require changes in the transformation code. |
I reviewed the code, looks good to me. We could merge this already or explore first the |
@@ -81,6 +81,10 @@ class ImageTilesDataset(Dataset): | |||
system; this back-transforms the target tile into the pixel coordinates. If the back-transformed tile is not | |||
aligned with the pixel grid, the returned tile will correspond to the bounding box of the back-transformed tile | |||
(so that the returned tile is axis-aligned to the pixel grid). | |||
return_genes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Two comments:
- I would specify that the layers are
AnnData
layers and the default layer isX
. - I would also allow to pass just a list instead of a dict, that would be interpreted as
{'X': genes_list}
I've just changed the format in py-spy this was just a push to get the code in another machine. But let me explain what's next. I've realized that the calculation of the transformed bounding box in the implicit coordinate system takes a fair amount of time and it could in fact be done only in the same way the tile coords dataset is built. I will therefore:
The dataset will have only type of output which will be dictionary of the following {
"tile":tile,
"annotations":list of annotations,
"gexp": list of gexp,
} wdyt? What I won't do here but would be useful to work on next is:
|
Thanks for the explanation. Yes, I think that operating on the transformation at the preprocessing stage is a good approach to improve performance. Also, the option to specify the layer will be useful. Regarding the return type, would you remove the |
that's a good question, I would potentially leave it but then technically the dataloader would fail as the default collate_fn only accepts array/mapping[str, array]/list[array], wdyt? |
Ok, then I would probably move the default away from returning |
close in favour of #687 |
so I've been wanting to take another look at this for a long time, I used https://github.com/benfred/py-spy with
speedscope
format, you can see screenshot below.I've been doing this on the
xenium_rep_1
dataset from the paper, and been using the following code (adapting from @LucaMarconato code ):Details
this made me realize that, if we want to return the array, than there is an unnecessary step of instantiating the
SpatialImage|MultiscaleSpatialImage
that is not necessary, and the dask array could be simply returned. This halved thefetch
step (across 6 iterations) from ~43s to ~23s total, see belowI think the
fetch
step is what ultimately we want to improve, as it's the one that stream the tiles from the zarr array to the GPU. Now the two main blocks are thetransform
call and thecompute
call. Thetransform
call is visualized undercompute
but it's effectively thewrapper
call, where all theDataArray.isel
happen, which is where the crops are defined, transformed and set, before the computation is actually triggered withcompute
.I wonder what could be the next step here to chase performance gain: I think one option would be to basically "prepare" the transformation before on the full array, and then trigger it only at the tile creation in the
compute
(whereas now, transformation and tile creation is done jointly for each tile). This I think would require significant refactoring though so I wonder if it makes sense at all, and if anyone has other ideas to explore @scverse/spatialdata