Skip to content

Commit

Permalink
Feature/knn transformer (databrickslabs#258)
Browse files Browse the repository at this point in the history
* Add spark mllib dependency.
Define Aproximate Spatial KNN transformer.
Define params for the transformer.

* Add logic for KNN transformer.
Add test behaviours.
Add KRing expression.
Add columnar functions for kring.

* Add logic for SpatialKNN iterative algorithm based on grid ripples (krings followed by hexrings).
Simplify cellID data type for index systems.
Add Cell based Kring and Kdisc.
Add Geometry based Kring and Kdisc.
Add explode versions for Kring and Kdisc generators.
Improve logic for grid_tessellate for POINT and MULTIPOINT geometries.

* Add grid_ expressions to python.
Add python wrappers for KNN algorithm.

* Restructure package hierarchy for mosaic.models.
Move ApproximateSpatialKNN to mosaic.models.knn.
Define IterativeTransformer trait for defining iterative algorithms.
Define BinaryTransformer trait for join based transformers.
Define HexRingNeighbours transformer.
Define ApproximateSpatialKNN transformer as an iterative transformer with hex neighbours being called in each transformation.
Define delta based checkpoints.

* Fix scala style warnings.
Remove scala style broken rule.
Update dependencies.

* Move ApproximateSpatialKNN to knn submodule in python.
Add getMetrics and getParams logic for python model.
Change the iteration logic to start from iteration number 1 in IterativeTransformer.
Update logic of ApproximateSpatialKNN transformer.
Fix DeltaTableCheckpoint name generator.
Adjust the last iteration in HexRingNeighbours transformer.
Update tests.
Migrate some testst to new implementation.

* Fix expression info for GeometryKDisc, GeometryKDiscExplode, GeometryKRing and GeometryKRingExplode.
Add missing SQL func definitions.
Update MosaicContext tests.
Fix topological comparison in MosaicSpatialQueryTest base class.

* Fix the convergence issue in ApproximateSpatialKNN.
Use max radius logic for the final iteration.
Add tests for CellKDisc and CellKring expressions.
Add tests for GeometryKDisc and GeometryKRing expressions.
Migrate index expressions to new test syntax.

* Fix the iteration number for the final iteration.
Add hotfix for SharedSparkSession test that creates a database in the catalog.
Change the transformer name to SpatialKNN.
Change useExact logic to approximate logic.
Change param name.

* Improve test coverage.

* Fix broken tests.

* Improve test coverage.

* Switch to scoverage.

* Update python naming to match scala types.

* Update documentation.
Group content based on a topic, api, usage and models.
Add imagery for knn docs page.
Add knn docs page.

* Fix PR comments.
Replace import * with more concrete alternatives.
Add st_buffer_disc to python APIs.
Fix naming in Transformers.

* Fix spatial_knn method names.
Add pydocs.

* Fix broken tests.

* Add documentation for ST_ and GRID_ expressions.

* Rename all *disc functions to *loop naming convention.
Update docs.
Add python tests for expressions.

* Fix broken tests.

* Remove TopN expression.

* Fix python tests.

* Fix method naming in SpatialKNN transformer.

* Fix naming in SpatialKNN transformer.

* Fix PR comments.
Fix naming of cell ids in python functions.
Refactor logic for geometry kring and kloop expressions.
Fix docs.

* Remove redundant parameter in docs.
  • Loading branch information
Milos Colic authored Nov 25, 2022
1 parent 5fd41c8 commit 1caf56f
Show file tree
Hide file tree
Showing 123 changed files with 6,664 additions and 929 deletions.
22 changes: 22 additions & 0 deletions docs/source/_static/css/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.doc-figure {
display: block;
margin-left: auto;
margin-right: auto;
width: 75%;
}

.doc-figure-full {
width: 100%;
}

.doc-figure-float-left {
display: block;
float: left !important;
width: 50%;
}

.figure-group {
display: flex;
flex-wrap: wrap;
justify-content: center;
}
12 changes: 12 additions & 0 deletions docs/source/api/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
API Documentation
=================

.. toctree::
:maxdepth: 2

geometry-constructors
geometry-accessors
spatial-functions
spatial-indexing
spatial-predicates
spatial-aggregations
67 changes: 65 additions & 2 deletions docs/source/api/spatial-functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ st_area
st_buffer
*********

.. function:: st_buffer(col)
.. function:: st_buffer(col, radius)

Buffer the input geometry by radius `radius` and return a new, buffered geometry.

Expand Down Expand Up @@ -174,6 +174,70 @@ st_buffer
|POLYGON ((29.1055...|
+--------------------+

st_bufferloop
*************

.. function:: st_bufferloop(col, innerRadius, outerRadius)

Returns a difference between st_buffer(col, outerRadius) and st_buffer(col, innerRadius).
The resulting geometry is a loop with a width of outerRadius - innerRadius.

:param col: Geometry
:type col: Column
:param innerRadius: Radius of the resulting geometry hole.
:type innerRadius: Column (DoubleType)
:param outerRadius: Radius of the resulting geometry.
:type outerRadius: Column (DoubleType)
:rtype: Column: Geometry

:example:

.. tabs::
.. code-tab:: py

>>> df = spark.createDataFrame([{'wkt': 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'}])
>>> df.select(st_bufferloop('wkt', lit(2.), lit(2.1)).show()
+-------------------------+
| st_buffer(wkt, 2.0, 2.1)|
+-------------------------+
| POLYGON ((29.1055...|
+-------------------------+

.. code-tab:: scala

>>> val df = List(("POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))")).toDF("wkt")
>>> df.select(st_bufferloop('wkt', lit(2.), lit(2.1))).show()
+-------------------------+
| st_buffer(wkt, 2.0, 2.1)|
+-------------------------+
| POLYGON ((29.1055...|
+-------------------------+

.. code-tab:: sql

>>> SELECT st_bufferloop("POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))", 2d, 2.1d)
+-------------------------+
| st_buffer(wkt, 2.0, 2.1)|
+-------------------------+
| POLYGON ((29.1055...|
+-------------------------+

.. code-tab:: r R

>>> df <- createDataFrame(data.frame(wkt = "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"))
>>> showDF(select(df, st_bufferloop('wkt', lit(2.), lit(2.1))))
+-------------------------+
| st_buffer(wkt, 2.0, 2.1)|
+-------------------------+
| POLYGON ((29.1055...|
+-------------------------+


.. figure:: ../images/st_bufferloop/geom.png
:figclass: doc-figure

Fig 1. ST_BufferLoop(geom, 0.02, 0.04)

st_centroid2D
*************

Expand Down Expand Up @@ -1724,4 +1788,3 @@ st_zmin
:type col: Column
:rtype: Column: DoubleType


Loading

0 comments on commit 1caf56f

Please sign in to comment.