Skip to content

Commit

Permalink
Merge branch 'databrickslabs:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
a0x8o authored Jan 30, 2024
2 parents 0769d93 + eef2dd2 commit 2cc633f
Show file tree
Hide file tree
Showing 4 changed files with 88 additions and 51 deletions.
38 changes: 20 additions & 18 deletions docs/source/api/raster-format-readers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,26 @@ Raster Format Readers

Intro
#####
Mosaic provides spark readers for the following raster formats:

* GTiff (GeoTiff) using .tif file extension - https://gdal.org/drivers/raster/gtiff.html
* COG (Cloud Optimized GeoTiff) using .tif file extension - https://gdal.org/drivers/raster/cog.html
* HDF4 using .hdf file extension - https://gdal.org/drivers/raster/hdf4.html
* HDF5 using .h5 file extension - https://gdal.org/drivers/raster/hdf5.html
* NetCDF using .nc file extension - https://gdal.org/drivers/raster/netcdf.html
* JP2ECW using .jp2 file extension - https://gdal.org/drivers/raster/jp2ecw.html
* JP2KAK using .jp2 file extension - https://gdal.org/drivers/raster/jp2kak.html
* JP2OpenJPEG using .jp2 file extension - https://gdal.org/drivers/raster/jp2openjpeg.html
* PDF using .pdf file extension - https://gdal.org/drivers/raster/pdf.html
* PNG using .png file extension - https://gdal.org/drivers/raster/png.html
* VRT using .vrt file extension - https://gdal.org/drivers/raster/vrt.html
* XPM using .xpm file extension - https://gdal.org/drivers/raster/xpm.html
* GRIB using .grb file extension - https://gdal.org/drivers/raster/grib.html
* Zarr using .zarr file extension - https://gdal.org/drivers/raster/zarr.html

Other formats are supported if supported by GDAL available drivers.
Mosaic provides spark readers for raster files supported by GDAL OGR drivers.
Only the drivers that are built by default are supported.
Here are some common useful file formats:

* `GTiff <https://gdal.org/drivers/raster/gtiff.html>`_ (GeoTiff) using .tif file extension
* `COG <https://gdal.org/drivers/raster/cog.html>`_ (Cloud Optimized GeoTiff) using .tif file extension
* `HDF4 <https://gdal.org/drivers/raster/hdf4.html>`_ using .hdf file extension
* `HDF5 <https://gdal.org/drivers/raster/hdf5.html>`_ using .h5 file extension
* `NetCDF <https://gdal.org/drivers/raster/netcdf.html>`_ using .nc file extension
* `JP2ECW <https://gdal.org/drivers/raster/jp2ecw.html>`_ using .jp2 file extension
* `JP2KAK <https://gdal.org/drivers/raster/jp2kak.html>`_ using .jp2 file extension
* `JP2OpenJPEG <https://gdal.org/drivers/raster/jp2openjpeg.html>`_ using .jp2 file extension
* `PDF <https://gdal.org/drivers/raster/pdf.html>`_ using .pdf file extension
* `PNG <https://gdal.org/drivers/raster/png.html>`_ using .png file extension
* `VRT <https://gdal.org/drivers/raster/vrt.html>`_ using .vrt file extension
* `XPM <https://gdal.org/drivers/raster/xpm.html>`_ using .xpm file extension
* `GRIB <https://gdal.org/drivers/raster/grib.html>`_ using .grb file extension
* `Zarr <https://gdal.org/drivers/raster/zarr.html>`_ using .zarr file extension

For more information please refer to gdal `raster driver <https://gdal.org/drivers/raster/index.html>`_ documentation.

Mosaic provides two flavors of the readers:

Expand Down
31 changes: 18 additions & 13 deletions docs/source/api/vector-format-readers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,24 @@ Mosaic provides spark readers for vector files supported by GDAL OGR drivers.
Only the drivers that are built by default are supported.
Here are some common useful file formats:

* GeoJSON (also ESRIJSON, TopoJSON) https://gdal.org/drivers/vector/geojson.html
* ESRI File Geodatabase (FileGDB) and ESRI File Geodatabase vector (OpenFileGDB). Mosaic implements named reader geo_db (described in this doc). https://gdal.org/drivers/vector/filegdb.html
* ESRI Shapefile / DBF (ESRI Shapefile) - Mosaic implements named reader shapefile (described in this doc) https://gdal.org/drivers/vector/shapefile.html
* Network Common Data Form (netCDF) - Mosaic implements raster reader also https://gdal.org/drivers/raster/netcdf.html
* (Geo)Parquet (Parquet) - Mosaic will be implementing a custom reader soon https://gdal.org/drivers/vector/parquet.html
* Spreadsheets (XLSX, XLS, ODS) https://gdal.org/drivers/vector/xls.html
* U.S. Census TIGER/Line (TIGER) https://gdal.org/drivers/vector/tiger.html
* PostgreSQL Dump (PGDump) https://gdal.org/drivers/vector/pgdump.html
* Keyhole Markup Language (KML) https://gdal.org/drivers/vector/kml.html
* Geography Markup Language (GML) https://gdal.org/drivers/vector/gml.html
* GRASS - option for Linear Referencing Systems (LRS) https://gdal.org/drivers/vector/grass.html

For more information please refer to gdal documentation: https://gdal.org/drivers/vector/index.html
* `GeoJSON <https://gdal.org/drivers/vector/geojson.html>`_ (also `ESRIJSON <https://gdal.org/drivers/vector/esrijson.html>`_,
`TopoJSON <https://gdal.org/drivers/vector/topojson.html>`_)
* `FileGDB <https://gdal.org/drivers/vector/filegdb.html>`_ (ESRI File Geodatabase) and
`OpenFileGDB <https://gdal.org/drivers/vector/openfilegdb.html>`_ (ESRI File Geodatabase vector) -
Mosaic implements named reader :ref:`spark.read.format("geo_db")` (described in this doc).
* `ESRI Shapefile <https://gdal.org/drivers/vector/shapefile.html>`_ (ESRI Shapefile / DBF) -
Mosaic implements named reader :ref:`spark.read.format("shapefile")` (described in this doc).
* `netCDF <https://gdal.org/drivers/raster/netcdf.html>`_ (Network Common Data Form) -
Mosaic supports GDAL netCDF raster reader also.
* `XLSX <https://gdal.org/drivers/vector/xlsx.html>`_, `XLS <https://gdal.org/drivers/vector/xls.html>`_,
`ODS <https://gdal.org/drivers/vector/ods.html>`_ spreadsheets
* `TIGER <https://gdal.org/drivers/vector/tiger.html>`_ (U.S. Census TIGER/Line)
* `PGDump <https://gdal.org/drivers/vector/pgdump.html>`_ (PostgreSQL Dump)
* `KML <https://gdal.org/drivers/vector/kml.html>`_ (Keyhole Markup Language)
* `GML <https://gdal.org/drivers/vector/gml.html>`_ (Geography Markup Language)
* `GRASS <https://gdal.org/drivers/vector/grass.html>`_ - option for Linear Referencing Systems (LRS)

For more information please refer to gdal `vector driver <https://gdal.org/drivers/vector/index.html>`_ documentation.


Mosaic provides two flavors of the general readers:
Expand Down
42 changes: 30 additions & 12 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,27 +73,38 @@ Version 0.4.x Series

We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled.


Mosaic 0.4.x series only supports DBR 13.x DBRs. If running on a different DBR it will throw an exception:

**DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13. You can specify `%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.**
**DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13. You can specify
`%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.**

Mosaic 0.4.x series issues the following ERROR on a standard, non-Photon cluster `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ | `AWS <https://docs.databricks.com/runtime/index.html/>`_ | `GCP <https://docs.gcp.databricks.com/runtime/index.html/>`_ :
Mosaic 0.4.x series issues an ERROR on standard, non-Photon clusters `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ |
`AWS <https://docs.databricks.com/runtime/index.html/>`_ |
`GCP <https://docs.gcp.databricks.com/runtime/index.html/>`_ :

**DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for spatial AI benefits; Mosaic 0.4.x series restricts executing this cluster.**
**DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for spatial
AI benefits; Mosaic 0.4.x series restricts executing this cluster.**

As of Mosaic 0.4.0 (subject to change in follow-on releases)

* `Assigned Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_ : Mosaic Python, SQL, R, and Scala APIs.
* `Shared Access Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_ : Mosaic Scala API (JVM) with Admin `allowlisting <https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html>`_ ; Python bindings to Mosaic Scala APIs are blocked by Py4J Security on Shared Access Clusters.
- Mosaic SQL expressions cannot yet be registered with `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_
due to API changes affecting DBRs >= 13, more `here <https://docs.databricks.com/en/udf/index.html>`_.
* `Shared Access Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_ : Mosaic Scala API (JVM) with
Admin `allowlisting <https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html>`_ ;
Python bindings to Mosaic Scala APIs are blocked by Py4J Security on Shared Access Clusters.

.. warning::
Mosaic SQL expressions cannot yet be registered with `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_
due to API changes affecting DBRs >= 13, more `here <https://docs.databricks.com/en/udf/index.html>`_.

.. note::
As of Mosaic 0.4.0 (subject to change in follow-on releases)

* `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ : Enforces process isolation which is difficult to accomplish with custom JVM libraries; as such only built-in (aka platform provided) JVM APIs can be invoked from other supported languages in Shared Access Clusters.
* `Volumes <https://docs.databricks.com/en/connect/unity-catalog/volumes.html>`_ : Along the same principle of isolation, clusters (both assigned and shared access) can read Volumes via relevant built-in readers and writers or via custom python calls which do not involve any custom JVM code.
* `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ : Enforces process isolation which is difficult to
accomplish with custom JVM libraries; as such only built-in (aka platform provided) JVM APIs can be invoked from other
supported languages in Shared Access Clusters.
* `Volumes <https://docs.databricks.com/en/connect/unity-catalog/volumes.html>`_ : Along the same principle of isolation,
clusters (both assigned and shared access) can read Volumes via relevant built-in readers and writers or via custom
python calls which do not involve any custom JVM code.


Version 0.3.x Series
Expand All @@ -105,11 +116,18 @@ For Mosaic versions < 0.4.0 please use the `0.3.x docs <https://databrickslabs.g
.. warning::
Mosaic 0.3.x series does not support DBR 13.x DBRs.

As of the 0.3.11 release, Mosaic issues the following WARNING when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ | `AWS <https://docs.databricks.com/runtime/index.html/>`_ | `GCP <https://docs.gcp.databricks.com/runtime/index.html/>`_ :
As of the 0.3.11 release, Mosaic issues the following WARNING when initialized on a cluster that is neither Photon Runtime
nor Databricks Runtime ML `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ |
`AWS <https://docs.databricks.com/runtime/index.html/>`_ |
`GCP <https://docs.gcp.databricks.com/runtime/index.html/>`_ :

**DEPRECATION WARNING: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for spatial AI benefits; Mosaic will stop working on this cluster after v0.3.x.**
**DEPRECATION WARNING: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for spatial
AI benefits; Mosaic will stop working on this cluster after v0.3.x.**

If you are receiving this warning in v0.3.11+, you will want to begin to plan for a supported runtime. The reason we are making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are powered by Photon. Along this direction of change, Mosaic has standardized to JTS as its default and supported Vector Geometry Provider.
If you are receiving this warning in v0.3.11+, you will want to begin to plan for a supported runtime. The reason we are
making this change is that we are streamlining Mosaic internals to be more aligned with future product APIs which are
powered by Photon. Along this direction of change, Mosaic has standardized to JTS as its default and supported Vector
Geometry Provider.


Documentation
Expand Down
28 changes: 20 additions & 8 deletions docs/source/usage/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,36 @@ Supported platforms

Mosaic 0.4.x series only supports DBR 13.x DBRs. If running on a different DBR it will throw an exception:

**DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13. You can specify `%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.**
**DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13. You can specify
`%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.**

Mosaic 0.4.x series issues the following ERROR on a standard, non-Photon cluster `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ | `AWS <https://docs.databricks.com/runtime/index.html/>`_ | `GCP <https://docs.gcp.databricks.com/runtime/index.html/>`_ :
Mosaic 0.4.x series issues an ERROR on standard, non-Photon clusters `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ |
`AWS <https://docs.databricks.com/runtime/index.html/>`_ |
`GCP <https://docs.gcp.databricks.com/runtime/index.html/>`_ :

**DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for spatial AI benefits; Mosaic 0.4.x series restricts executing this cluster.**
**DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for spatial
AI benefits; Mosaic 0.4.x series restricts executing this cluster.**

As of Mosaic 0.4.0 (subject to change in follow-on releases)

* `Assigned Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_ : Mosaic Python, SQL, R, and Scala APIs.
* `Shared Access Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_ : Mosaic Scala API (JVM) with Admin `allowlisting <https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html>`_ ; Python bindings to Mosaic Scala APIs are blocked by Py4J Security on Shared Access Clusters.
- Mosaic SQL expressions cannot yet be registered with `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_
due to API changes affecting DBRs >= 13, more `here <https://docs.databricks.com/en/udf/index.html>`_.
* `Shared Access Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_ : Mosaic Scala API (JVM) with
Admin `allowlisting <https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html>`_ ;
Python bindings to Mosaic Scala APIs are blocked by Py4J Security on Shared Access Clusters.

.. warning::
Mosaic SQL expressions cannot yet be registered with `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_
due to API changes affecting DBRs >= 13, more `here <https://docs.databricks.com/en/udf/index.html>`_.

.. note::
As of Mosaic 0.4.0 (subject to change in follow-on releases)

* `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ : Enforces process isolation which is difficult to accomplish with custom JVM libraries; as such only built-in (aka platform provided) JVM APIs can be invoked from other supported languages in Shared Access Clusters.
* `Volumes <https://docs.databricks.com/en/connect/unity-catalog/volumes.html>`_ : Along the same principle of isolation, clusters (both assigned and shared access) can read Volumes via relevant built-in readers and writers or via custom python calls which do not involve any custom JVM code.
* `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ : Enforces process isolation which is difficult to
accomplish with custom JVM libraries; as such only built-in (aka platform provided) JVM APIs can be invoked from other
supported languages in Shared Access Clusters.
* `Volumes <https://docs.databricks.com/en/connect/unity-catalog/volumes.html>`_ : Along the same principle of isolation,
clusters (both assigned and shared access) can read Volumes via relevant built-in readers and writers or via custom
python calls which do not involve any custom JVM code.

If you have cluster creation permissions in your Databricks
workspace, you can create a cluster using the instructions
Expand Down

0 comments on commit 2cc633f

Please sign in to comment.