Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiscales subsetting #90

Open
HelenaLC opened this issue Nov 19, 2024 · 8 comments
Open

multiscales subsetting #90

HelenaLC opened this issue Nov 19, 2024 · 8 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@HelenaLC
Copy link
Owner

HelenaLC commented Nov 19, 2024

It's not clear to me how to implement subsetting of multiscale ZarrArrays. E.g., say we have scales 4x4 and 2x2. Then the only subsets that could be propagated to all resolutions would be 1..2 and 3..4, which would keep the 1st and 2nd low-res. index, respectively. How is this handled on the Python side? Shall we simply restrict to valid indices (i.e., throw an error)? Or drop low-res. scales that don't comply, e.g., when subsetting 2..3 (potentially with an argument drop=TRUE/FALSE, default FALSE, that would make users aware/in control of this)? ...perhaps worth thinking about whether Mike Smith has this on his radar, i.e., representing the pyramid beyond one layer at a time.

@HelenaLC HelenaLC added help wanted Extra attention is needed question Further information is requested labels Nov 19, 2024
@Artur-man
Copy link
Collaborator

Artur-man commented Nov 19, 2024

Let me give you an idea here, this subsetting strategy work very well in when you do subset based zooming in voltron using pyramid images:

https://github.com/BIMSBbioinfo/ImageArray
https://x.com/AltunaAkalin/status/1841864608073572707

ImageArray is programmed the same way that the current multiscale is implemented, when you subset you use the indices of the first scale, and the indices should be scaled down to subset other layers lazily.

@HelenaLC
Copy link
Owner Author

Thanks for the link, and yeah, I got that part (i.e., scaling down indices to subset low-res.). However, this doesn't resolve my confusion. Specifically,

  • the example given for ImageArray works because the numbers nicely scale down
  • e.g., 2000:3000 (length 1000) becomes 250:375 (length 125) ...or something like that
  • however, what would happen for the example I gave above, when high-res. indices cannot be scaled to low-res.?
  • e.g., (from above) arrays 4x4 and 2x2 and index 2:3 wouldn't make any sense for the 2x2 res.

@HelenaLC
Copy link
Owner Author

HelenaLC commented Nov 19, 2024

Based on the code here, I would answer my question by: the low-res. is kept completely in case indices don't map. E.g.,

# imagine a 4x4 array 'a', and calling 'a[2:3, ]';
# here, 'i=2' would be a 2x2 array
> curind <- c(2, 3); i <- 2
> seq(
  floor(head(curind,1)/(2^(i-1))), 
  ceiling(tail(curind,1)/(2^(i-1))))
[1] 1 2

@Artur-man
Copy link
Collaborator

Artur-man commented Nov 19, 2024

I thought a lot about this when I was programming pyramids, for large images this small ceilling and floor operations to not really matter (you can interactively see it), but for small examples it looks a bit weird.

@HelenaLC
Copy link
Owner Author

Yeah, it's quite intelligent/elegant that little seq() you got there! I think I'll use this for now. I just got confused when subsetting the blobs and nothing happens. The whole pyramid thing is something that hopefully will come in from else where (e.g., Rarr) eventually... till then, we'll just hack our way through it :)

@vjcitn
Copy link
Collaborator

vjcitn commented Nov 19, 2024

Just sticking my nose in. Comments in code can be really crucial, maybe linking back to these issues. Explaining the seq chunk fullly may ward off technical debt payments down the road; apologies if it is already there.
Also if there is anticipation of new functionality in Rarr do we need to raise an issue there? I am swamped with other items so not contributing much now but let me know if I am blocking anything.

HelenaLC added a commit that referenced this issue Nov 24, 2024
@HelenaLC
Copy link
Owner Author

HelenaLC commented Nov 24, 2024

Alright, I have implemented the above as a temporarily working version. However, I still believe this needs to be given more thought... Specifically, [ currently only supports sequential indexing (i.e., must be a seq). I could imagine, however, that one would want to exclude part of the image somewhere, in which case we'd want to give an option to set that to NA (= black) or something, rather then crop (i.e., distorting the image)... Also, the whole rounding business (floor/ceiling) is fine for visualizations, but might mess things up for actual computations on the image (if these aren't carried out on the highest resolution), unless we restrict to indices that can be propagated across all layers... etc. etc. etc. So yeah, a pyramidal array class isn't so easy, and maybe worth having it be implemented independently eventually.

@Artur-man
Copy link
Collaborator

This is a good point and I think we should bring this up in the next SpatialData meeting at scverse. Now that I think about it, we have never tried this on SpatialData-python right ? does it do floor/ceiling ?

Also one question though, what operation you believe will mess up the images if the floor/ceiling applied to scaled indices ?

The only case I ever imagine the pyramid scheme is in fact needed ... is for visualisation where you load the lower memory allocating array. Everything else, say learning embeddings and making predictions on image tiles could be done on the actual large image.

Lets think more though i agree ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants