-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster s2_dwithin_matrix()
; add s2_prepared_dwithin()
#174
Conversation
Codecov Report
@@ Coverage Diff @@
## main #174 +/- ##
==========================================
+ Coverage 93.93% 94.02% +0.08%
==========================================
Files 49 49
Lines 3463 3514 +51
==========================================
+ Hits 3253 3304 +51
Misses 210 210
Continue to review full report at Codecov.
|
Looks like a game changer!! |
Thanks, only tested small data sets yet, but see major improvements. |
I should have checked The code I was using to test the complexity is below. Apparently it's still linear, just a lot faster...if you switch library(s2)
countries <- s2_data_countries()
bench_points <- function(n) {
points <- s2_point(runif(n, -1, 1), runif(n, -1, 1), runif(n, -1, 1))
geog <- s2::as_s2_geography(s2::as_s2_lnglat(points))
tibble::tibble(
n = n,
bench::mark(s2_dwithin_matrix(geog, countries, 1e6))[2:3]
)
}
results <- rbind(
bench_points(1e2),
bench_points(3e2),
bench_points(5e2),
bench_points(7e2),
bench_points(9e2),
bench_points(1e3),
bench_points(3e3),
bench_points(5e3),
bench_points(7e3),
bench_points(9e3),
bench_points(1e5),
bench_points(5e5),
bench_points(1e6)
)
plot(median ~ n, data = results) |
|
I'd be honoured to have contributor status! In some future interface that's maybe arrow array in -> arrow array out, the output will probably be I'd like to work through the open s2 issue list while I'm "in it", but I'll set a hard deadline of May 29 (Sunday) to start the revdep process since I'm a fantastically bad estimator of what I can get done during my kids' naps. |
Updated docs and a vignette with a subsection describing spherical distance-based neighbours ready for your s2 release. I don't know what distribution fits nap duration, I'm just as poor at guessing for kids and their kids too. I think they help generate useful thoughts, but no idea how. |
Anywhere from 0 minutes to 3 hours! Right now they both have sore throats and fevers and won't nap unless they are literally clinging to us (although, fingers crossed, they are both currently asleep). |
Fixes #157. Before this PR, the index wasn't efficiently being used for a matrix within distance query. I also added
s2_prepred_dwithin()
to match the added GEOS function, although the "prepare" step is much more expensive here. Still, it's super useful for the maybe commons2_prepared_dwithin(big_long_points_vector, a_single_polygon_vector)
.Before this PR:
After this PR:
TODO: Add tests for
s2_prepared_dwithin()
and make sure that the code paths I expect are getting tested fors2_dwithin_matrix()
are getting tested.(@rsbivand sorry for taking so long to get here but hopefully this is what you were after!)