Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vectorize bounding box query #699

Merged
merged 19 commits into from
Sep 26, 2024
Merged

vectorize bounding box query #699

merged 19 commits into from
Sep 26, 2024

Conversation

giovp
Copy link
Member

@giovp giovp commented Sep 2, 2024

Trying to vectorize the tiling in the dataloader PR, realized some improvements should be added separately.

With this PR, I enable the vectorization of bounding_box_query for all elements.

This means that it is now possible to pass an array of bounding boxes (and not just one).

TODO:

  • tests for multiple bounding box queries for raster data
  • tests for multiple bounding box queries for shapes
  • tests for multiple bounding box queries for points

@LucaMarconato @melonora do you have any suggestion of when this could be used to replace current implementations across the projects? A clear use case (and the motivation for this contribution) is the dataloader, see #687 for the speedup, but I wonder if there are other places where this is useful.

This also is the groundwork for eventual update to the dataloader, being able to return batches of all elements.

considering that this PR is already getting too large, I would postpone the vectorization of polygon query.

Copy link

codecov bot commented Sep 2, 2024

Codecov Report

Attention: Patch coverage is 91.75258% with 16 lines in your changes missing coverage. Please review.

Project coverage is 91.76%. Comparing base (774b492) to head (b8b6331).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_core/query/_utils.py 82.05% 14 Missing ⚠️
src/spatialdata/_core/query/spatial_query.py 98.14% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #699      +/-   ##
==========================================
- Coverage   91.83%   91.76%   -0.07%     
==========================================
  Files          44       45       +1     
  Lines        6781     6887     +106     
==========================================
+ Hits         6227     6320      +93     
- Misses        554      567      +13     
Files with missing lines Coverage Δ
src/spatialdata/_docs.py 100.00% <100.00%> (ø)
src/spatialdata/_core/query/spatial_query.py 95.21% <98.14%> (+0.47%) ⬆️
src/spatialdata/_core/query/_utils.py 84.37% <82.05%> (-11.28%) ⬇️

... and 1 file with indirect coverage changes

@giovp giovp marked this pull request as ready for review September 3, 2024 00:25
@giovp
Copy link
Member Author

giovp commented Sep 3, 2024

ready for a first pass of review

@giovp giovp changed the title few improvements to transformations vectorize bounding box query Sep 4, 2024
@giovp
Copy link
Member Author

giovp commented Sep 4, 2024

ready for review

@LucaMarconato
Copy link
Member

Thank you @giovp for the PR, I will review now.

A clear use case (and the motivation for this contribution) is the dataloader, see #687 for the speedup, but I wonder if there are other places where this is useful.

I can't think of other places right now, so I think we are good with the current improvement.

considering that this PR is already getting too large, I would postpone the vectorization of polygon query.

I agree.


@nb.njit(parallel=False, nopython=True)
def _create_slices_and_translation(
min_values: nb.types.Array[nb.float64, nb.float64],
Copy link
Member

@LucaMarconato LucaMarconato Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nb.types.Array[nb.float64, np.float64] may be incorrect and that the correct version is nb.types.Array(nb.float64, 2, 'C'). But I am not sure because pre-commit doesn't complain, so maybe both syntaxes are allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mine is wrong. Are you sure about your typing? What looks strange to me is that types like nb.types.Array[nb.float64, nb.int64] would not have a meaning. I tried using just nb.types.Array and pre-commit works, maybe this is the way to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into it!

Copy link
Member

@LucaMarconato LucaMarconato Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the dtype, I'd merge.

@LucaMarconato
Copy link
Member

@giovp I finished reviewing; I applied some minor changes like simplified the docs and added an extra test.

@LucaMarconato LucaMarconato enabled auto-merge (squash) September 26, 2024 16:13
@LucaMarconato LucaMarconato merged commit 8239455 into main Sep 26, 2024
8 checks passed
@LucaMarconato LucaMarconato deleted the giovp/parallel-transform branch September 26, 2024 16:30
melonora added a commit to melonora/spatialdata that referenced this pull request Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants