You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
geo_join does a full m*n comparison of points. This works for moderately sized datasets but becomes intractable in both time and memory.
The right geo_join() implementation would likely use k-d trees to build a spatial index of one table, then look up points from the other, while taking advantage of the limited max_dist argument. This would turn it (on average) from O(m*n) time to O(m*log(n)) time. (Similar approaches could be used for functions such as distance_join, and on a related note b-k trees could be used for Levenshtein distance).
I may or may not have time to implement this but it would be a top-notch pull request.
The text was updated successfully, but these errors were encountered:
dgrtwo
changed the title
Would be great for geo_join to use kd trees
Would be great for geo_join to use k-d trees
Feb 8, 2018
geo_join
does a full m*n comparison of points. This works for moderately sized datasets but becomes intractable in both time and memory.The right
geo_join()
implementation would likely use k-d trees to build a spatial index of one table, then look up points from the other, while taking advantage of the limitedmax_dist
argument. This would turn it (on average) from O(m*n) time to O(m*log(n)) time. (Similar approaches could be used for functions such asdistance_join
, and on a related note b-k trees could be used for Levenshtein distance).I may or may not have time to implement this but it would be a top-notch pull request.
The text was updated successfully, but these errors were encountered: