Skip to content

Commit

Permalink
Draft: Added custom grid (#265)
Browse files Browse the repository at this point in the history
* Added custom grid

* Fixed missing IndexID

* Changed index system parameter type

* Fixed snapshot version in python

* Passing IndexSystem instead of IndexSystemName

* Updated python path loader

* Fixed CustomIndexSystem polyfill for empty polygon

* Fixed mosaic context h3

* Fixed mosaic unite tests for custom grid

* Added K-ring and K-loop for custom grid

* Fixed cell center offset for custom grid

* Fixed lon lat order in custom grid

* Fixed grid offset for cell geometries

* Fixed KNN tests for custom grid

* Fixed buffer radius custom grid

* Reduced KNN iterations to speed up tests

* Fixed python test

* Fixed python import

* Changed custom grid conf structure

* Fixed unit tests for custom grid

* Refactored IndexSystemFactory

* Fixed unit test for KNN

* Added custom grid for python

* Fixed R bindings for custom grid

* Updated docs for custom grid

* Updated docs for custom grid

* Fix KNN tests.

* Fixed index Y splits

* Fixed index KNN resolution

* Fixed double build for feature branch

* Fixed grid root cell size calculation

* Fixed Ring Neighbours tests

* Merge main and fix conflicts.

---------

Co-authored-by: milos.colic <[email protected]>
  • Loading branch information
edurdevic and milos.colic authored Mar 20, 2023
1 parent 2fdffa8 commit 74a55e9
Show file tree
Hide file tree
Showing 63 changed files with 989 additions and 383 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/build_main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@ on:
- "scala/*"
pull_request:
branches:
- '**'
- "R/*"
- "r/*"
- "python/*"
- "scala/*"
jobs:
build:
runs-on: ubuntu-20.04
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/build_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@ on:
push:
branches:
- "python/*"
pull_request:
branches:
- "python/*"

jobs:
build:
runs-on: ubuntu-20.04
Expand Down
5 changes: 1 addition & 4 deletions .github/workflows/build_r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,7 @@ on:
branches:
- 'r/*'
- 'R/*'
pull_request:
branches:
- 'r/*'
- 'R/*'

jobs:
build:
runs-on: ubuntu-20.04
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/build_scala.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,7 @@ on:
push:
branches:
- "scala/"
pull_request:
branches:
- "scala/"

jobs:
build:
runs-on: ubuntu-20.04
Expand Down
3 changes: 1 addition & 2 deletions R/sparkR-mosaic/enableMosaic.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ enableMosaic <- function(
,rasterAPI="GDAL"
){
geometry_api <- sparkR.callJStatic(x="com.databricks.labs.mosaic.core.geometry.api.GeometryAPI", methodName="apply", geometryAPI)
index_system_id <- sparkR.callJStatic(x="com.databricks.labs.mosaic.core.index.IndexSystemID", methodName="apply", indexSystem)
indexing_system <- sparkR.callJStatic(x="com.databricks.labs.mosaic.core.index.IndexSystemID", methodName="getIndexSystem", index_system_id)
indexing_system <- sparkR.callJStatic(x="com.databricks.labs.mosaic.core.index.IndexSystemFactory", methodName="getIndexSystem", indexSystem)

raster_api <- sparkR.callJStatic(x="com.databricks.labs.mosaic.core.raster.api.RasterAPI", methodName="apply", rasterAPI)

Expand Down
24 changes: 24 additions & 0 deletions docs/source/api/spatial-indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,30 @@ Spatial grid indexing
Spatial grid indexing is the process of mapping a geometry (or a point) to one or more cells (or cell ID)
from the selected spatial grid.

The grid system can be specified by using the spark configuration `spark.databricks.labs.mosaic.index.system`
before enabling Mosaic.

The valid values are
* `H3` - Good all-rounder for any location on earth
* `BNG` - Local grid system Great Britain (EPSG:27700)
* `CUSTOM(minX,maxX,minY,maxY,splits,rootCellSizeX,rootCellSizeY)` - Can be used with any local or global CRS
* `minX`,`maxX`,`minY`,`maxY` can be positive or negative integers defining the grid bounds
* `splits` defines how many splits are applied to each cell for an increase in resolution step (usually 2 or 10)
* `rootCellSizeX`,`rootCellSizeY` define the size of the cells on resolution 0

Example

.. tabs::
.. code-tab:: py

spark.conf.set("spark.databricks.labs.mosaic.index.system", "H3") # Default
# spark.conf.set("spark.databricks.labs.mosaic.index.system", "BNG")
# spark.conf.set("spark.databricks.labs.mosaic.index.system", "CUSTOM(-180,180,-90,90,2,30,30)")

import mosaic as mos
mos.enable_mosaic(spark, dbutils)


grid_longlatascellid
********************

Expand Down
12 changes: 9 additions & 3 deletions python/mosaic/core/library_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ class MosaicLibraryHandler:
spark = None
sc = None
_jar_path = None
_jar_filename = f"mosaic-{importlib.metadata.version('databricks-mosaic')}-jar-with-dependencies.jar"
_jar_filename = None
_auto_attached_enabled = None

def __init__(self, spark):
Expand Down Expand Up @@ -50,8 +50,14 @@ def mosaic_library_location(self):
)
self._jar_filename = self._jar_path.split("/")[-1]
except Py4JJavaError as e:
with importlib.resources.path("mosaic.lib", self._jar_filename) as p:
self._jar_path = p.as_posix()
self._jar_filename = f"mosaic-{importlib.metadata.version('databricks-mosaic')}-jar-with-dependencies.jar"
try:
with importlib.resources.path("mosaic.lib", self._jar_filename) as p:
self._jar_path = p.as_posix()
except FileNotFoundError as fnf:
self._jar_filename = f"mosaic-{importlib.metadata.version('databricks-mosaic')}-SNAPSHOT-jar-with-dependencies.jar"
with importlib.resources.path("mosaic.lib", self._jar_filename) as p:
self._jar_path = p.as_posix()
return self._jar_path

def auto_attach(self):
Expand Down
5 changes: 3 additions & 2 deletions python/mosaic/core/mosaic_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ def __init__(self, spark: SparkSession):
self._mosaicPackageRef = getattr(sc._jvm.com.databricks.labs.mosaic, "package$")
self._mosaicPackageObject = getattr(self._mosaicPackageRef, "MODULE$")
self._mosaicGDALObject = getattr(sc._jvm.com.databricks.labs.mosaic.gdal, "MosaicGDAL")
self._indexSystemFactory = getattr(sc._jvm.com.databricks.labs.mosaic.core.index, "IndexSystemFactory")

try:
self._geometry_api = spark.conf.get(
Expand All @@ -46,12 +47,12 @@ def __init__(self, spark: SparkSession):
except Py4JJavaError as e:
self._raster_api = "GDAL"

IndexSystemClass = getattr(self._mosaicPackageObject, self._index_system)
IndexSystem = self._indexSystemFactory.getIndexSystem(self._index_system)
GeometryAPIClass = getattr(self._mosaicPackageObject, self._geometry_api)
RasterAPIClass = getattr(self._mosaicPackageObject, self._raster_api)

self._context = self._mosaicContextClass.build(
IndexSystemClass(), GeometryAPIClass(), RasterAPIClass()
IndexSystem, GeometryAPIClass(), RasterAPIClass()
)

def invoke_function(self, name: str, *args: Any) -> MosaicColumn:
Expand Down
4 changes: 4 additions & 0 deletions python/test/utils/spark_test_case.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import unittest
import os
from importlib.metadata import version

from pyspark.sql import SparkSession
Expand All @@ -14,6 +15,9 @@ class SparkTestCase(unittest.TestCase):
@classmethod
def setUpClass(cls) -> None:
cls.library_location = f"{mosaic.__path__[0]}/lib/mosaic-{version('databricks-mosaic')}-jar-with-dependencies.jar"
if not os.path.exists(cls.library_location):
cls.library_location = f"{mosaic.__path__[0]}/lib/mosaic-{version('databricks-mosaic')}-SNAPSHOT-jar-with-dependencies.jar"

cls.spark = (
SparkSession.builder.master("local")
.config("spark.jars", cls.library_location)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ import scala.util.{Success, Try}
*/
object BNGIndexSystem extends IndexSystem(StringType) with Serializable {

val name = "BNG"

/**
* Quadrant encodings. The order is determined in a way that preserves
* similarity to space filling curves.
Expand Down Expand Up @@ -201,15 +203,6 @@ object BNGIndexSystem extends IndexSystem(StringType) with Serializable {
}
}

/**
* Returns the index system ID instance that uniquely identifies an index
* system. This instance is used to select appropriate Mosaic expressions.
*
* @return
* An instance of [[IndexSystemID]]
*/
override def getIndexSystemID: IndexSystemID = BNG

/**
* Get the k ring of indices around the provided index id.
*
Expand Down
Loading

0 comments on commit 74a55e9

Please sign in to comment.