You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NB: The binding errors present in 0.9.0 -> 0.9.2 prevent this from being checked for those versions, so exactly when this started is a little fuzzy.
For a query along the lines of (note: for v0.8.1, the geom column was wkb_geometry::GEOMETRY):
SELECTl.idFROM locations l JOIN catchment_geojson gj
ON ST_CONTAINS(gj.geom, l.point);
where locations is ~1k points, and catchment_geojson is a single-row table of multipolygon geometries, execution time jumps from 0.02s to 22.83s.
With the addition of SET disabled_optimizers = 'extension';, execution time drops back down to that observed in v0.8.1 (with an identical physical plan). Semi-joins (appropriate in scenarios like this) also produce the same BNL-centric physical plan (as opposed to the NL Join + Filter plan). An inner join against an exploded (~280 polygons) version of the multipolygon brings total execution time down to 1.6s.
<122k rows (duckdb's row group size, the minimum threshold for evaluating a filter predicate in parallel).
High false-positive for the bounding box range query (multi-polygons, highly concave polygons (e.g. a cross, which intersects with a tiny proportion of its bounding box)).
The problem seem to be that the range rewrite does not produce a IE-Join like intended but still remains a NL-join. It could be that something changed duckdb side that made the planning differ in the latest version.
Mm, I noticed joins involving Polygons (still stored as Geometry, but presumably those are auto-cast to duckdb Polygon_2D internally) were correctly using IE-Joins (of course, much better hit rate given their bounding boxes were a fair bit closer to their convex hull), though still paired with a Filter(ST_Contains(...)).
No luck with SET prefer_range_joins=true, I still get an NL-join.
NB: The binding errors present in 0.9.0 -> 0.9.2 prevent this from being checked for those versions, so exactly when this started is a little fuzzy.
For a query along the lines of (note: for v0.8.1, the geom column was
wkb_geometry::GEOMETRY
):where locations is ~1k points, and catchment_geojson is a single-row table of multipolygon geometries, execution time jumps from 0.02s to 22.83s.
With the addition of
SET disabled_optimizers = 'extension';
, execution time drops back down to that observed in v0.8.1 (with an identical physical plan). Semi-joins (appropriate in scenarios like this) also produce the same BNL-centric physical plan (as opposed to the NL Join + Filter plan). An inner join against an exploded (~280 polygons) version of the multipolygon brings total execution time down to 1.6s.By the looks of the Range Join rewriting logic at https://github.com/duckdb/duckdb_spatial/blob/v1.0.0/spatial/src/spatial/core/optimizer_rules.cpp#L33 , the factors that produce this kind of perf degradation are:
Profiling
Before (v0.8.1):
After (v0.10.0):
Query plan diagnostics
Before (v0.8.1):
After (v0.10.0):
The text was updated successfully, but these errors were encountered: