From f44760df1c2c59ccc66607198b1e2c01341f161a Mon Sep 17 00:00:00 2001 From: "milos.colic" Date: Mon, 6 Nov 2023 14:59:56 +0000 Subject: [PATCH] Add new raster capabilities to the docs. Update old references. --- docs/source/api/raster-format-readers.rst | 24 +- docs/source/api/raster-functions.rst | 1969 ++++++++++++----- docs/source/usage/install-gdal.rst | 6 +- .../labs/mosaic/core/raster/api/GDAL.scala | 2 +- 4 files changed, 1492 insertions(+), 509 deletions(-) diff --git a/docs/source/api/raster-format-readers.rst b/docs/source/api/raster-format-readers.rst index dd077e6e3..dabcc821e 100644 --- a/docs/source/api/raster-format-readers.rst +++ b/docs/source/api/raster-format-readers.rst @@ -20,7 +20,7 @@ Mosaic provides spark readers for the following raster formats: * XPM using .xpm file extension - https://gdal.org/drivers/raster/xpm.html * GRIB using .grb file extension - https://gdal.org/drivers/raster/grib.html * Zarr using .zarr file extension - https://gdal.org/drivers/raster/zarr.html -Other formats supported by GDAL will be added in future releases. +Other formats are supported if supported by GDAL available drivers. Mosaic provides two flavors of the readers: * spark.read.format("gdal") for reading 1 file per spark task @@ -32,7 +32,7 @@ spark.read.format("gdal") A base Spark SQL data source for reading GDAL raster data sources. It reads metadata of the raster and exposes the direct paths for the raster files. The output of the reader is a DataFrame with the following columns: - * path - path to the raster file on dbfs (StringType) + * tile - loaded raster tile (RasterTileType) * ySize - height of the raster in pixels (IntegerType) * xSize - width of the raster in pixels (IntegerType) * bandCount - number of bands in the raster (IntegerType) @@ -59,11 +59,11 @@ The output of the reader is a DataFrame with the following columns: .option("driverName", "GTiff")\ .load("dbfs:/path/to/raster.tif") df.show() - +--------------------+-----+-----+---------+--------------------+--------------------+----+--------------------+ - | path|ySize|xSize|bandCount| metadata| subdatasets|srid| proj4Str| - +--------------------+-----+-----+---------+--------------------+--------------------+----+--------------------+ - |dbfs:/path/to/ra...| 100| 100| 1|{AREA_OR_POINT=Po...| null| 4326|+proj=longlat +da...| - +--------------------+-----+-----+---------+--------------------+--------------------+----+--------------------+ + +---------------------------------------------------------------------------------------------------------------+------+------+----------+---------------------+--------------------+-----+----------------------+ + | tile| ySize| xSize| bandCount| metadata| subdatasets| srid| proj4Str| + +---------------------------------------------------------------------------------------------------------------+------+------+----------+---------------------+--------------------+-----+----------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | 100 | 100 | 1 | {AREA_OR_POINT=Po...| null| 4326| +proj=longlat +da...| + +---------------------------------------------------------------------------------------------------------------+------+------+----------+---------------------+--------------------+-----+----------------------+ .. code-tab:: scala @@ -71,11 +71,11 @@ The output of the reader is a DataFrame with the following columns: .option("driverName", "GTiff") .load("dbfs:/path/to/raster.tif") df.show() - +--------------------+-----+-----+---------+--------------------+--------------------+----+--------------------+ - | path|ySize|xSize|bandCount| metadata| subdatasets|srid| proj4Str| - +--------------------+-----+-----+---------+--------------------+--------------------+----+--------------------+ - |dbfs:/path/to/ra...| 100| 100| 1|{AREA_OR_POINT=Po...| null| 4326|+proj=longlat +da...| - +--------------------+-----+-----+---------+--------------------+--------------------+----+--------------------+ + +---------------------------------------------------------------------------------------------------------------+------+------+----------+---------------------+--------------------+-----+----------------------+ + | tile| ySize| xSize| bandCount| metadata| subdatasets| srid| proj4Str| + +---------------------------------------------------------------------------------------------------------------+------+------+----------+---------------------+--------------------+-----+----------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | 100 | 100 | 1 | {AREA_OR_POINT=Po...| null| 4326| +proj=longlat +da...| + +---------------------------------------------------------------------------------------------------------------+------+------+----------+---------------------+--------------------+-----+----------------------+ .. warning:: Issue 350: https://github.com/databrickslabs/mosaic/issues/350 diff --git a/docs/source/api/raster-functions.rst b/docs/source/api/raster-functions.rst index 131447c64..2cf0e90fa 100644 --- a/docs/source/api/raster-functions.rst +++ b/docs/source/api/raster-functions.rst @@ -11,17 +11,23 @@ Mainly raster to grid functions, which are useful for reprojecting the raster da This is useful for performing spatial joins between raster data and vector data. Mosaic also provides a scalable retiling function that can be used to retile raster data in case of bottlenecking due to large files. All raster functions respect the \"rst\_\" prefix naming convention. +In versions <= 0.3.11 mosaic was operating using either string paths or byte arrays. +In versions > 0.3.11 mosaic is operating using tile objects only. Tile objects are created using rst_fromfile(path_to_raster) function. +If you use spark.read.format("gdal") tiles are automatically generated for you. + +.. note:: For mosaic versions > 0.3.11 please do not use setup_gdal call. There is no longer a need for shared objects to be copied around. + Please use the updated init_script.sh script to install GDAL on your cluster. See :doc:`Install and Enable GDAL with Mosaic ` for more details. rst_bandmetadata **************** -.. function:: rst_bandmetadata(raster, band) +.. function:: rst_bandmetadata(tile, band) Extract the metadata describing the raster band. Metadata is return as a map of key value pairs. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param band: The band number to extract metadata for. :type band: Column (IntegerType) :rtype: Column: MapType(StringType, StringType) @@ -31,62 +37,404 @@ rst_bandmetadata .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ - .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") - df.select(mos.rst_bandmetadata("path", F.lit(1))).limit(1).display() - +------------------------------------------------------------------------------------+ - |rst_bandmetadata(path, 1) | - +------------------------------------------------------------------------------------+ - |{"_FillValue": "251", "NETCDF_DIM_time": "1294315200", "long_name": "bleaching alert| - |area 7-day maximum composite", "grid_mapping": "crs", "NETCDF_VARNAME": | - |"bleaching_alert_area", "coverage_content_type": "thematicClassification", | - |"standard_name": "N/A", "comment": "Bleaching Alert Area (BAA) values are coral | - |bleaching heat stress levels: 0 - No Stress; 1 - Bleaching Watch; 2 - Bleaching | - |Warning; 3 - Bleaching Alert Level 1; 4 - Bleaching Alert Level 2. Product | - |description is provided at https://coralreefwatch.noaa.gov/product/5km/index.php.", | - |"valid_min": "0", "units": "stress_level", "valid_max": "4", "scale_factor": "1"} | - +------------------------------------------------------------------------------------+ + df = spark.read.format("gdal").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/gdal-netcdf-coral") + df.select(mos.rst_bandmetadata("tile", F.lit(1))).limit(1).display() + +--------------------------------------------------------------------------------------+ + | rst_bandmetadata(tile, 1) | + +--------------------------------------------------------------------------------------+ + | {"_FillValue": "251", "NETCDF_DIM_time": "1294315200", "long_name": "bleaching alert | + | area 7-day maximum composite", "grid_mapping": "crs", "NETCDF_VARNAME": | + | "bleaching_alert_area", "coverage_content_type": "thematicClassification", | + | "standard_name": "N/A", "comment": "Bleaching Alert Area (BAA) values are coral | + | bleaching heat stress levels: 0 - No Stress; 1 - Bleaching Watch; 2 - Bleaching | + | Warning; 3 - Bleaching Alert Level 1; 4 - Bleaching Alert Level 2. Product | + | description is provided at https://coralreefwatch.noaa.gov/product/5km/index.php.", | + | "valid_min": "0", "units": "stress_level", "valid_max": "4", "scale_factor": "1"} | + +--------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") - .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") - df.select(rst_bandmetadata(col("path"), lit(1)).limit(1).show(false) - +------------------------------------------------------------------------------------+ - |rst_bandmetadata(path, 1) | - +------------------------------------------------------------------------------------+ - |{"_FillValue": "251", "NETCDF_DIM_time": "1294315200", "long_name": "bleaching alert| - |area 7-day maximum composite", "grid_mapping": "crs", "NETCDF_VARNAME": | - |"bleaching_alert_area", "coverage_content_type": "thematicClassification", | - |"standard_name": "N/A", "comment": "Bleaching Alert Area (BAA) values are coral | - |bleaching heat stress levels: 0 - No Stress; 1 - Bleaching Watch; 2 - Bleaching | - |Warning; 3 - Bleaching Alert Level 1; 4 - Bleaching Alert Level 2. Product | - |description is provided at https://coralreefwatch.noaa.gov/product/5km/index.php.", | - |"valid_min": "0", "units": "stress_level", "valid_max": "4", "scale_factor": "1"} | - +------------------------------------------------------------------------------------+ + .format("gdal").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(rst_bandmetadata(col("tile"), lit(1)).limit(1).show(false) + +--------------------------------------------------------------------------------------+ + | rst_bandmetadata(tile, 1) | + +--------------------------------------------------------------------------------------+ + | {"_FillValue": "251", "NETCDF_DIM_time": "1294315200", "long_name": "bleaching alert | + | area 7-day maximum composite", "grid_mapping": "crs", "NETCDF_VARNAME": | + | "bleaching_alert_area", "coverage_content_type": "thematicClassification", | + | "standard_name": "N/A", "comment": "Bleaching Alert Area (BAA) values are coral | + | bleaching heat stress levels: 0 - No Stress; 1 - Bleaching Watch; 2 - Bleaching | + | Warning; 3 - Bleaching Alert Level 1; 4 - Bleaching Alert Level 2. Product | + | description is provided at https://coralreefwatch.noaa.gov/product/5km/index.php.", | + | "valid_min": "0", "units": "stress_level", "valid_max": "4", "scale_factor": "1"} | + +--------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") - SELECT rst_bandmetadata(path, 1) FROM coral_netcdf LIMIT 1 - +------------------------------------------------------------------------------------+ - |rst_bandmetadata(path, 1) | - +------------------------------------------------------------------------------------+ - |{"_FillValue": "251", "NETCDF_DIM_time": "1294315200", "long_name": "bleaching alert| - |area 7-day maximum composite", "grid_mapping": "crs", "NETCDF_VARNAME": | - |"bleaching_alert_area", "coverage_content_type": "thematicClassification", | - |"standard_name": "N/A", "comment": "Bleaching Alert Area (BAA) values are coral | - |bleaching heat stress levels: 0 - No Stress; 1 - Bleaching Watch; 2 - Bleaching | - |Warning; 3 - Bleaching Alert Level 1; 4 - Bleaching Alert Level 2. Product | - |description is provided at https://coralreefwatch.noaa.gov/product/5km/index.php.", | - |"valid_min": "0", "units": "stress_level", "valid_max": "4", "scale_factor": "1"} | - +------------------------------------------------------------------------------------+ + USING gdal + OPTIONS (extension "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_bandmetadata(tile, 1) FROM coral_netcdf LIMIT 1 + +--------------------------------------------------------------------------------------+ + | rst_bandmetadata(tile, 1) | + +--------------------------------------------------------------------------------------+ + | {"_FillValue": "251", "NETCDF_DIM_time": "1294315200", "long_name": "bleaching alert | + | area 7-day maximum composite", "grid_mapping": "crs", "NETCDF_VARNAME": | + | "bleaching_alert_area", "coverage_content_type": "thematicClassification", | + | "standard_name": "N/A", "comment": "Bleaching Alert Area (BAA) values are coral | + | bleaching heat stress levels: 0 - No Stress; 1 - Bleaching Watch; 2 - Bleaching | + | Warning; 3 - Bleaching Alert Level 1; 4 - Bleaching Alert Level 2. Product | + | description is provided at https://coralreefwatch.noaa.gov/product/5km/index.php.", | + | "valid_min": "0", "units": "stress_level", "valid_max": "4", "scale_factor": "1"} | + +--------------------------------------------------------------------------------------+ + +rst_boundingbox +*************** + +.. function:: rst_boundingbox(raster) + + Returns the bounding box of the raster as a polygon geometry. + + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) + :rtype: Column: StructType(DoubleType, DoubleType, DoubleType, DoubleType) + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("gdal").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/gdal-netcdf-coral") + df.select(mos.rst_boundingbox("tile")).limit(1).display() + +------------------------------------------------------------------+ + | rst_boundingbox(tile) | + +------------------------------------------------------------------+ + | [00 00 ... 00] // WKB representation of the polygon bounding box | + +------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("gdal").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(rst_boundingbox(col("tile"))).limit(1).show(false) + +------------------------------------------------------------------+ + | rst_boundingbox(tile) | + +------------------------------------------------------------------+ + | [00 00 ... 00] // WKB representation of the polygon bounding box | + +------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extension "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_boundingbox(tile) FROM coral_netcdf LIMIT 1 + +------------------------------------------------------------------+ + | rst_boundingbox(tile) | + +------------------------------------------------------------------+ + | [00 00 ... 00] // WKB representation of the polygon bounding box | + +------------------------------------------------------------------+ + +rst_clip +******** + +.. function:: rst_clip(raster, geometry) + + Clips the raster to the geometry. + The geometry is expected to be in the same coordinate reference system as the raster. + The geometry is expected to be a polygon or a multipolygon. + The output raster will have the same extent as the input geometry. + The output raster will have the same number of bands as the input raster. + The output raster will have the same pixel type as the input raster. + The output raster will have the same pixel size as the input raster. + The output raster will have the same coordinate reference system as the input raster. + + :param tile: A column containing the raster tile. + :type col: Column (RasterTileType) + :param geometry: A column containing the geometry to clip the raster to. + :type col: Column (GeometryType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("gdal").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/gdal-netcdf-coral") + df.select(mos.rst_clip("tile", F.lit("POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))"))).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("gdal").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(rst_clip(col("tile"), lit("POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))"))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) | + +-----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +-----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extension "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_clip(tile, "POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))") FROM coral_netcdf LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_clip(tile, POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + +rst_combineavg +************** + +.. function:: rst_combineavg(rasters) + + Combines a collection of rasters by averaging the pixel values. + The rasters must have the same extent, number of bands, and pixel type. + The rasters must have the same pixel size and coordinate reference system. + The output raster will have the same extent as the input rasters. + The output raster will have the same number of bands as the input rasters. + The output raster will have the same pixel type as the input rasters. + The output raster will have the same pixel size as the input rasters. + The output raster will have the same coordinate reference system as the input rasters. + + :param tile: A column containing an array of raster tiles. + :type col: Column (ArrayType(RasterTileType)) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("gdal").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/gdal-netcdf-coral")\ + .groupBy().agg(F.collect_list("tile").alias("tile")) + df.select(mos.rst_combineavg("tile")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_combineavg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("gdal").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + .groupBy().agg(collect_list(col("tile")).as("tile")) + df.select(rst_combineavg(col("tile"))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_combineavg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extension "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + WITH grouped as ( + SELECT collect_list(tile) as tile FROM coral_netcdf + ) + SELECT rst_combineavg(tile) FROM grouped LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_combineavg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + +rst_combineavgagg +***************** + +.. function:: rst_combineavgagg(rasters) + + Combines a group by statement over rasters by averaging the pixel values. + The rasters must have the same extent, number of bands, and pixel type. + The rasters must have the same pixel size and coordinate reference system. + The output raster will have the same extent as the input rasters. + The output raster will have the same number of bands as the input rasters. + The output raster will have the same pixel type as the input rasters. + The output raster will have the same pixel size as the input rasters. + The output raster will have the same coordinate reference system as the input rasters. + + :param tile: A column containing raster tiles. + :type col: Column (ArrayType(RasterTileType)) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("gdal").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/gdal-netcdf-coral")\ + df.groupBy().agg(mos.rst_combineavgagg("tile")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_combineavgagg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("gdal").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.groupBy().agg(rst_combineavgagg(col("tile"))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_combineavgagg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extension "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_combineavgagg(tile) + FROM coral_netcdf + GROUP BY 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_combineavgagg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + +rst_frombands +************** + +.. function:: rst_frombands(rasters) + + Combines a collection of rasters into a single raster. + The rasters must have the same extent. + The rasters must have the same pixel coordinate reference system. + The output raster will have the same extent as the input rasters. + The output raster will have the same number of bands as all the input raster bands. + The output raster will have the same pixel type as the input raster bands. + The output raster will have the same pixel size as the highest resolution input rasters. + The output raster will have the same coordinate reference system as the input rasters. + + :param tile: A column containing an array of raster tiles. + :type col: Column (ArrayType(RasterTileType)) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("gdal").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/gdal-netcdf-coral")\ + .groupBy().agg(F.collect_list("tile").alias("tile")) + df.select(mos.rst_frombands("tile")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_frombands(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("gdal").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + .groupBy().agg(collect_list(col("tile")).as("tile")) + df.select(rst_frombands(col("tile"))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_frombands(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extension "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + WITH grouped as ( + SELECT collect_list(tile) as tile FROM coral_netcdf + ) + SELECT rst_frombands(tile) FROM grouped LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_frombands(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + +rst_fromfile +************ + +.. function:: rst_fromfile(path, ) + + Returns a raster tile from a file path. + The file path must be a string. + The file path must be a valid path to a raster file. + The file path must be a path to a file that GDAL can read. + If the size_in_MB parameter is specified, the raster will be split into tiles of the specified size. + If the size_in_MB parameter is not specified, the raster will not be split into tiles. + If the size_in_Mb < 0 the raster wont be split into tiles. + + :param path: A column containing the path to a raster file. + :type col: Column (StringType) + :param size_in_MB: Optional parameter to specify the size of the raster tile in MB. Default is not to split the input. + :type col: Column (IntegerType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("binaryFile")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(mos.rst_fromfile("path")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_fromfile(path) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(rst_fromfile(col("path"))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_fromfile(path) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING binaryFile + OPTIONS (path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_fromfile(path) FROM coral_netcdf LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_fromfile(path) | + +----------------------------------------------------------------------------------------------------------------+ rst_georeference -*************** +**************** .. function:: rst_georeference(raster) @@ -98,8 +446,8 @@ rst_georeference GT(4) column rotation (typically zero). GT(5) n-s pixel resolution / pixel height (negative value for a north-up image). - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: MapType(StringType, DoubleType) :example: @@ -107,41 +455,144 @@ rst_georeference .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_georeference("path")).limit(1).display() - +-------------------------------------------------------------------------------------------+ - |rst_georeference(path) | - +-------------------------------------------------------------------------------------------+ - |{"scaleY": -0.049999999152053956, "skewX": 0, "skewY": 0, "upperLeftY": 89.99999847369712, | - |"upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} | - +-------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------+ + | rst_georeference(path) | + +--------------------------------------------------------------------------------------------+ + | {"scaleY": -0.049999999152053956, "skewX": 0, "skewY": 0, "upperLeftY": 89.99999847369712, | + | "upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} | + +--------------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_georeference(col("path"))).limit(1).show() - +-------------------------------------------------------------------------------------------+ - |rst_georeference(path) | - +-------------------------------------------------------------------------------------------+ - |{"scaleY": -0.049999999152053956, "skewX": 0, "skewY": 0, "upperLeftY": 89.99999847369712, | - |"upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} | - +-------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------+ + | rst_georeference(path) | + +--------------------------------------------------------------------------------------------+ + | {"scaleY": -0.049999999152053956, "skewX": 0, "skewY": 0, "upperLeftY": 89.99999847369712, | + | "upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} | + +--------------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_georeference(path) FROM coral_netcdf LIMIT 1 - +-------------------------------------------------------------------------------------------+ - |rst_georeference(path) | - +-------------------------------------------------------------------------------------------+ - |{"scaleY": -0.049999999152053956, "skewX": 0, "skewY": 0, "upperLeftY": 89.99999847369712, | - |"upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} | - +-------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------+ + | rst_georeference(path) | + +--------------------------------------------------------------------------------------------+ + | {"scaleY": -0.049999999152053956, "skewX": 0, "skewY": 0, "upperLeftY": 89.99999847369712, | + | "upperLeftX": -180.00000610436345, "scaleX": 0.050000001695656514} | + +--------------------------------------------------------------------------------------------+ + +rest_getnodata +************** + +.. function:: rst_getnodata(raster) + + Returns the nodata value of the raster bands. + + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) + :rtype: Column: ArrayType(DoubleType) + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("binaryFile").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(mos.rst_getnodata("path")).limit(1).display() + +---------------------+ + | rst_getnodata(path) | + +---------------------+ + | [0.0, -9999.0, ...] | + +---------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(rst_getnodata(col("path"))).limit(1).show() + +---------------------+ + | rst_getnodata(path) | + +---------------------+ + | [0.0, -9999.0, ...] | + +---------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_getnodata(path) FROM coral_netcdf LIMIT 1 + +---------------------+ + | rst_getnodata(path) | + +---------------------+ + | [0.0, -9999.0, ...] | + +---------------------+ + +rst_getsubdataset +***************** + +.. function:: rst_getsubdataset(raster, name) + + Returns the subdataset of the raster with a given name. + The subdataset name must be a string. The name is not a full path. + The name is the last identifier in the subdataset path (FORMAT:PATH:NAME). + The subdataset name must be a valid subdataset name for the raster. + + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) + :param name: A column containing the name of the subdataset to return. + :type col: Column (StringType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("binaryFile").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(mos.rst_getsubdataset("path", "sst")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_getsubdataset(path, sst) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(rst_getsubdataset(col("path"), lit("sst"))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_getsubdataset(path, sst) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_getsubdataset(path, "sst") FROM coral_netcdf LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_getsubdataset(path, sst) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ rst_height ********** @@ -150,8 +601,8 @@ rst_height Returns the height of the raster in pixels. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: IntegerType :example: @@ -159,20 +610,20 @@ rst_height .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_height('path')).show() +--------------------+ | rst_height(path) | +--------------------+ - |3600 | - |3600 | + | 3600 | + | 3600 | +--------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_height(col("path"))).show() +--------------------+ @@ -185,8 +636,8 @@ rst_height .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_height(path) FROM coral_netcdf +--------------------+ | rst_height(path) | @@ -195,6 +646,64 @@ rst_height |3600 | +--------------------+ +rst_initnodata +************** + +.. function:: rst_initnodata(raster) + + Initializes the nodata value of the raster bands. + The nodata value will be set to default values for the pixel type of the raster bands. + The output raster will have the same extent as the input raster. + The default nodata value for ByteType is 0. + The default nodata value for UnsignedShortType is UShort.MaxValue (65535). + The default nodata value for ShortType is Short.MinValue (-32768). + The default nodata value for UnsignedIntegerType is Int.MaxValue (4.294967294E9). + The default nodata value for IntegerType is Int.MinValue (-2147483648). + The default nodata value for FloatType is Float.MinValue (-3.4028234663852886E38). + The default nodata value for DoubleType is Double.MinValue (-1.7976931348623157E308). + + :param tile: A column containing the raster tile. + :type col: Column (RasterTileType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("binaryFile").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(mos.rst_initnodata("path")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_initnodata(path) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(rst_initnodata(col("path"))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_initnodata(path) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_initnodata(path) FROM coral_netcdf LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_initnodata(path) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + rst_isempty ************* @@ -202,8 +711,8 @@ rst_isempty Returns true if the raster is empty. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: BooleanType :example: @@ -211,7 +720,7 @@ rst_isempty .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_isempty('path')).show() +--------------------+ @@ -224,7 +733,7 @@ rst_isempty .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_isempty(col("path"))).show() +--------------------+ @@ -237,8 +746,8 @@ rst_isempty .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_isempty(path) FROM coral_netcdf +--------------------+ | rst_height(path) | @@ -254,8 +763,8 @@ rst_memsize Returns size of the raster in bytes. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: LongType :example: @@ -263,7 +772,7 @@ rst_memsize .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_memsize('path')).show() +--------------------+ @@ -276,7 +785,7 @@ rst_memsize .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_memsize(col("path"))).show() +--------------------+ @@ -289,8 +798,8 @@ rst_memsize .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_memsize(path) FROM coral_netcdf +--------------------+ | rst_height(path) | @@ -299,6 +808,132 @@ rst_memsize |730260 | +--------------------+ +rst_merge +********* + +.. function:: rst_merge(rasters) + + Combines a collection of rasters into a single raster. + The rasters do not need to have the same extent. + The rasters must have the same coordinate reference system. + The rasters are combined using gdalwarp. + The noData value needs to be initialised; if not, the non valid pixels may introduce artifacts in the output raster. + The rasters are stacked in the order they are provided. + The output raster will have the extent covering all input rasters. + The output raster will have the same number of bands as the input rasters. + The output raster will have the same pixel type as the input rasters. + The output raster will have the same pixel size as the highest resolution input rasters. + The output raster will have the same coordinate reference system as the input rasters. + + :param tile: A column containing an array of raster tiles. + :type col: Column (ArrayType(RasterTileType)) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("gdal").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/gdal-netcdf-coral")\ + .groupBy().agg(F.collect_list("tile").alias("tile")) + df.select(mos.rst_merge("tile")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_merge(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("gdal").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + .groupBy().agg(collect_list(col("tile")).as("tile")) + df.select(rst_merge(col("tile"))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_merge(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extension "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + WITH grouped as ( + SELECT collect_list(tile) as tile FROM coral_netcdf + ) + SELECT rst_merge(tile) FROM grouped LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_merge(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + +rst_mergeagg +************ + +.. function:: rst_mergeagg(rasters) + + Combines a collection of rasters into a single raster. + The rasters do not need to have the same extent. + The rasters must have the same coordinate reference system. + The rasters are combined using gdalwarp. + The noData value needs to be initialised; if not, the non valid pixels may introduce artifacts in the output raster. + The rasters are stacked in the order they are provided. + This order is randomized since this is an aggregation function. + If the order of rasters is important please first collect rasters and sort them by metadata information and then use + rst_merge function. + The output raster will have the extent covering all input rasters. + The output raster will have the same number of bands as the input rasters. + The output raster will have the same pixel type as the input rasters. + The output raster will have the same pixel size as the highest resolution input rasters. + The output raster will have the same coordinate reference system as the input rasters. + + :param tile: A column containing raster tiles. + :type col: Column (RasterTileType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("gdal").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/gdal-netcdf-coral") + df.select(mos.rst_mergeagg("tile")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_mergeagg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("gdal").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(rst_mergeagg(col("tile"))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_mergeagg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extension "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_mergeagg(tile) FROM coral_netcdf LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_mergeagg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + rst_metadata ************* @@ -307,8 +942,8 @@ rst_metadata Extract the metadata describing the raster. Metadata is return as a map of key value pairs. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: MapType(StringType, StringType) :example: @@ -316,65 +951,123 @@ rst_metadata .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_metadata('path')).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_metadata(path) | - +------------------------------------------------------------------------------------------------------------------+ - |{"NC_GLOBAL#publisher_url": "https://coralreefwatch.noaa.gov", "NC_GLOBAL#geospatial_lat_units": "degrees_north", | - |"NC_GLOBAL#platform_vocabulary": "NOAA NODC Ocean Archive System Platforms", "NC_GLOBAL#creator_type": "group", | - |"NC_GLOBAL#geospatial_lon_units": "degrees_east", "NC_GLOBAL#geospatial_bounds": "POLYGON((-90.0 180.0, 90.0 | - |180.0, 90.0 -180.0, -90.0 -180.0, -90.0 180.0))", "NC_GLOBAL#keywords": "Oceans > Ocean Temperature > Sea Surface | - |Temperature, Oceans > Ocean Temperature > Water Temperature, Spectral/Engineering > Infrared Wavelengths > Thermal| - |Infrared, Oceans > Ocean Temperature > Bleaching Alert Area", "NC_GLOBAL#geospatial_lat_max": "89.974998", | - |.... (truncated).... "NC_GLOBAL#history": "This is a product data file of the NOAA Coral Reef Watch Daily Global | - |5km Satellite Coral Bleaching Heat Stress Monitoring Product Suite Version 3.1 (v3.1) in its NetCDF Version 1.0 | - |(v1.0).", "NC_GLOBAL#publisher_institution": "NOAA/NESDIS/STAR Coral Reef Watch Program", | - |"NC_GLOBAL#cdm_data_type": "Grid"} | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_metadata(path) | + +--------------------------------------------------------------------------------------------------------------------+ + | {"NC_GLOBAL#publisher_url": "https://coralreefwatch.noaa.gov", "NC_GLOBAL#geospatial_lat_units": "degrees_north", | + | "NC_GLOBAL#platform_vocabulary": "NOAA NODC Ocean Archive System Platforms", "NC_GLOBAL#creator_type": "group", | + | "NC_GLOBAL#geospatial_lon_units": "degrees_east", "NC_GLOBAL#geospatial_bounds": "POLYGON((-90.0 180.0, 90.0 | + | 180.0, 90.0 -180.0, -90.0 -180.0, -90.0 180.0))", "NC_GLOBAL#keywords": "Oceans > Ocean Temperature > Sea Surface | + | Temperature, Oceans > Ocean Temperature > Water Temperature, Spectral/Engineering > Infrared Wavelengths > Thermal | + | Infrared, Oceans > Ocean Temperature > Bleaching Alert Area", "NC_GLOBAL#geospatial_lat_max": "89.974998", | + | .... (truncated).... "NC_GLOBAL#history": "This is a product data file of the NOAA Coral Reef Watch Daily Global | + | 5km Satellite Coral Bleaching Heat Stress Monitoring Product Suite Version 3.1 (v3.1) in its NetCDF Version 1.0 | + | (v1.0).", "NC_GLOBAL#publisher_institution": "NOAA/NESDIS/STAR Coral Reef Watch Program", | + | "NC_GLOBAL#cdm_data_type": "Grid"} | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_metadata(col("path"))).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_metadata(path) | - +------------------------------------------------------------------------------------------------------------------+ - |{"NC_GLOBAL#publisher_url": "https://coralreefwatch.noaa.gov", "NC_GLOBAL#geospatial_lat_units": "degrees_north", | - |"NC_GLOBAL#platform_vocabulary": "NOAA NODC Ocean Archive System Platforms", "NC_GLOBAL#creator_type": "group", | - |"NC_GLOBAL#geospatial_lon_units": "degrees_east", "NC_GLOBAL#geospatial_bounds": "POLYGON((-90.0 180.0, 90.0 | - |180.0, 90.0 -180.0, -90.0 -180.0, -90.0 180.0))", "NC_GLOBAL#keywords": "Oceans > Ocean Temperature > Sea Surface | - |Temperature, Oceans > Ocean Temperature > Water Temperature, Spectral/Engineering > Infrared Wavelengths > Thermal| - |Infrared, Oceans > Ocean Temperature > Bleaching Alert Area", "NC_GLOBAL#geospatial_lat_max": "89.974998", | - |.... (truncated).... "NC_GLOBAL#history": "This is a product data file of the NOAA Coral Reef Watch Daily Global | - |5km Satellite Coral Bleaching Heat Stress Monitoring Product Suite Version 3.1 (v3.1) in its NetCDF Version 1.0 | - |(v1.0).", "NC_GLOBAL#publisher_institution": "NOAA/NESDIS/STAR Coral Reef Watch Program", | - |"NC_GLOBAL#cdm_data_type": "Grid"} | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_metadata(path) | + +--------------------------------------------------------------------------------------------------------------------+ + | {"NC_GLOBAL#publisher_url": "https://coralreefwatch.noaa.gov", "NC_GLOBAL#geospatial_lat_units": "degrees_north", | + | "NC_GLOBAL#platform_vocabulary": "NOAA NODC Ocean Archive System Platforms", "NC_GLOBAL#creator_type": "group", | + | "NC_GLOBAL#geospatial_lon_units": "degrees_east", "NC_GLOBAL#geospatial_bounds": "POLYGON((-90.0 180.0, 90.0 | + | 180.0, 90.0 -180.0, -90.0 -180.0, -90.0 180.0))", "NC_GLOBAL#keywords": "Oceans > Ocean Temperature > Sea Surface | + | Temperature, Oceans > Ocean Temperature > Water Temperature, Spectral/Engineering > Infrared Wavelengths > Thermal | + | Infrared, Oceans > Ocean Temperature > Bleaching Alert Area", "NC_GLOBAL#geospatial_lat_max": "89.974998", | + | .... (truncated).... "NC_GLOBAL#history": "This is a product data file of the NOAA Coral Reef Watch Daily Global | + | 5km Satellite Coral Bleaching Heat Stress Monitoring Product Suite Version 3.1 (v3.1) in its NetCDF Version 1.0 | + | (v1.0).", "NC_GLOBAL#publisher_institution": "NOAA/NESDIS/STAR Coral Reef Watch Program", | + | "NC_GLOBAL#cdm_data_type": "Grid"} | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_metadata(path) FROM coral_netcdf LIMIT 1 - +------------------------------------------------------------------------------------------------------------------+ - | rst_metadata(path) | - +------------------------------------------------------------------------------------------------------------------+ - |{"NC_GLOBAL#publisher_url": "https://coralreefwatch.noaa.gov", "NC_GLOBAL#geospatial_lat_units": "degrees_north", | - |"NC_GLOBAL#platform_vocabulary": "NOAA NODC Ocean Archive System Platforms", "NC_GLOBAL#creator_type": "group", | - |"NC_GLOBAL#geospatial_lon_units": "degrees_east", "NC_GLOBAL#geospatial_bounds": "POLYGON((-90.0 180.0, 90.0 | - |180.0, 90.0 -180.0, -90.0 -180.0, -90.0 180.0))", "NC_GLOBAL#keywords": "Oceans > Ocean Temperature > Sea Surface | - |Temperature, Oceans > Ocean Temperature > Water Temperature, Spectral/Engineering > Infrared Wavelengths > Thermal| - |Infrared, Oceans > Ocean Temperature > Bleaching Alert Area", "NC_GLOBAL#geospatial_lat_max": "89.974998", | - |.... (truncated).... "NC_GLOBAL#history": "This is a product data file of the NOAA Coral Reef Watch Daily Global | - |5km Satellite Coral Bleaching Heat Stress Monitoring Product Suite Version 3.1 (v3.1) in its NetCDF Version 1.0 | - |(v1.0).", "NC_GLOBAL#publisher_institution": "NOAA/NESDIS/STAR Coral Reef Watch Program", | - |"NC_GLOBAL#cdm_data_type": "Grid"} | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_metadata(path) | + +--------------------------------------------------------------------------------------------------------------------+ + | {"NC_GLOBAL#publisher_url": "https://coralreefwatch.noaa.gov", "NC_GLOBAL#geospatial_lat_units": "degrees_north", | + | "NC_GLOBAL#platform_vocabulary": "NOAA NODC Ocean Archive System Platforms", "NC_GLOBAL#creator_type": "group", | + | "NC_GLOBAL#geospatial_lon_units": "degrees_east", "NC_GLOBAL#geospatial_bounds": "POLYGON((-90.0 180.0, 90.0 | + | 180.0, 90.0 -180.0, -90.0 -180.0, -90.0 180.0))", "NC_GLOBAL#keywords": "Oceans > Ocean Temperature > Sea Surface | + | Temperature, Oceans > Ocean Temperature > Water Temperature, Spectral/Engineering > Infrared Wavelengths > Thermal | + | Infrared, Oceans > Ocean Temperature > Bleaching Alert Area", "NC_GLOBAL#geospatial_lat_max": "89.974998", | + | .... (truncated).... "NC_GLOBAL#history": "This is a product data file of the NOAA Coral Reef Watch Daily Global | + | 5km Satellite Coral Bleaching Heat Stress Monitoring Product Suite Version 3.1 (v3.1) in its NetCDF Version 1.0 | + | (v1.0).", "NC_GLOBAL#publisher_institution": "NOAA/NESDIS/STAR Coral Reef Watch Program", | + | "NC_GLOBAL#cdm_data_type": "Grid"} | + +--------------------------------------------------------------------------------------------------------------------+ + +rst_ndvi +******** + +.. function:: rst_ndvi(raster, red_band, nir_band) + + Calculates the Normalized Difference Vegetation Index (NDVI) for a raster. + The NDVI is calculated using the formula: (NIR - RED) / (NIR + RED). + The output raster will have the same extent as the input raster. + The output raster will have a single band. + The output raster will have a pixel type of float64. + The output raster will have the same coordinate reference system as the input raster. + + :param tile: A column containing the raster tile. + :type col: Column (RasterTileType) + :param red_band: A column containing the band number of the red band. + :type col: Column (IntegerType) + :param nir_band: A column containing the band number of the near infrared band. + :type col: Column (IntegerType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("binaryFile").option("extensions", "nc")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(mos.rst_ndvi("path", 1, 2)).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_ndvi(path, 1, 2) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile").option("extensions", "nc") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + df.select(rst_ndvi(col("path"), lit(1), lit(2))).limit(1).show(false) + +----------------------------------------------------------------------------------------------------------------+ + | rst_ndvi(path, 1, 2) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_netcdf + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + SELECT rst_ndvi(path, 1, 2) FROM coral_netcdf LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_ndvi(path, 1, 2) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ rst_numbands ************* @@ -383,8 +1076,8 @@ rst_numbands Returns number of bands in the raster. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: IntegerType :example: @@ -392,7 +1085,7 @@ rst_numbands .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_numbands('path')).show() +---------------------+ @@ -405,7 +1098,7 @@ rst_numbands .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_metadata(col("path"))).show() +---------------------+ @@ -418,8 +1111,8 @@ rst_numbands .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_metadata(path) +---------------------+ | rst_numbands(path) | @@ -435,8 +1128,8 @@ rst_pixelheight Returns the height of the pixel in the raster derived via GeoTransform. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -444,41 +1137,41 @@ rst_pixelheight .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_pixelheight('path')).show() - +---------------------+ - |rst_pixelheight(path)| - +---------------------+ - | 1 | - | 1 | - +---------------------+ + +-----------------------+ + | rst_pixelheight(path) | + +-----------------------+ + | 1 | + | 1 | + +-----------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_pixelheight(col("path"))).show() - +---------------------+ - |rst_pixelheight(path)| - +---------------------+ - | 1 | - | 1 | - +---------------------+ + +-----------------------+ + | rst_pixelheight(path) | + +-----------------------+ + | 1 | + | 1 | + +-----------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_pixelheight(path) - +---------------------+ - |rst_pixelheight(path)| - +---------------------+ - | 1 | - | 1 | - +---------------------+ + +-----------------------+ + | rst_pixelheight(path) | + +-----------------------+ + | 1 | + | 1 | + +-----------------------+ rst_pixelwidth ************** @@ -487,8 +1180,8 @@ rst_pixelwidth Returns the width of the pixel in the raster derived via GeoTransform. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -496,7 +1189,7 @@ rst_pixelwidth .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_pixelwidth('path')).show() +---------------------+ @@ -509,7 +1202,7 @@ rst_pixelwidth .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_pixelwidth(col("path"))).show() +---------------------+ @@ -522,8 +1215,8 @@ rst_pixelwidth .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_pixelwidth(path) +---------------------+ | rst_pixelwidth(path)| @@ -542,8 +1235,8 @@ rst_rastertogridavg CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the average of the pixel values in the cell. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param raster: A resolution of the grid index system. :type col: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) @@ -553,56 +1246,56 @@ rst_rastertogridavg .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_rastertogridavg('path', F.lit(3)).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridavg(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridavg(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_rastertogridavg(col("path"), lit(3)).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridavg(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridavg(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_rastertogridavg(path, 3) - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridavg(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridavg(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. figure:: ../images/rst_rastertogridavg/h3.png :figclass: doc-figure @@ -619,8 +1312,8 @@ rst_rastertogridcount CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the average of the pixel values in the cell. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param raster: A resolution of the grid index system. :type col: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) @@ -630,55 +1323,55 @@ rst_rastertogridcount .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_rastertogridcount('path', F.lit(3)).show() +------------------------------------------------------------------------------------------------------------------+ | rst_rastertogridcount(path, 3) | +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1}, | - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 3}, | - |{"cellID": "593785619583336447", "measure": 3}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 3}, | + | {"cellID": "593785619583336447", "measure": 3}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | +------------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_rastertogridcount(col("path"), lit(3)).show() +------------------------------------------------------------------------------------------------------------------+ | rst_rastertogridcount(path, 3) | +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1}, | - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 3}, | - |{"cellID": "593785619583336447", "measure": 3}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 3}, | + | {"cellID": "593785619583336447", "measure": 3}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | +------------------------------------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_rastertogridcount(path, 3) +------------------------------------------------------------------------------------------------------------------+ | rst_rastertogridcount(path, 3) | +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1}, | - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 3}, | - |{"cellID": "593785619583336447", "measure": 3}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 3}, | + | {"cellID": "593785619583336447", "measure": 3}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | +------------------------------------------------------------------------------------------------------------------+ .. figure:: ../images/rst_rastertogridavg/h3.png @@ -696,8 +1389,8 @@ rst_rastertogridmax CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the maximum pixel value. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param raster: A resolution of the grid index system. :type col: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) @@ -707,56 +1400,56 @@ rst_rastertogridmax .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_rastertogridmax('path', F.lit(3)).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridmax(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridmax(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_rastertogridmax(col("path"), lit(3)).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridmax(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridmax(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_rastertogridmax(path, 3) - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridmax(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridmax(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. figure:: ../images/rst_rastertogridavg/h3.png :figclass: doc-figure @@ -773,8 +1466,8 @@ rst_rastertogridmedian CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the median pixel value. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param raster: A resolution of the grid index system. :type col: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) @@ -784,56 +1477,56 @@ rst_rastertogridmedian .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_rastertogridmedian('path', F.lit(3)).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridmedian(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridmedian(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_rastertogridmedian(col("path"), lit(3)).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridmedian(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridmedian(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_rastertogridmax(path, 3) - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridmedian(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridmedian(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. figure:: ../images/rst_rastertogridavg/h3.png :figclass: doc-figure @@ -850,8 +1543,8 @@ rst_rastertogridmin CellID can be LongType or StringType depending on the configuration of MosaicContext. The value/measure for each cell is the median pixel value. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param raster: A resolution of the grid index system. :type col: Column (IntegerType) :rtype: Column: ArrayType(ArrayType(StructType(LongType|StringType, DoubleType))) @@ -861,56 +1554,56 @@ rst_rastertogridmin .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_rastertogridmin('path', F.lit(3)).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridmin(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridmin(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_rastertogridmin(col("path"), lit(3)).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridmin(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridmin(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_rastertogridmin(path, 3) - +------------------------------------------------------------------------------------------------------------------+ - | rst_rastertogridmin(path, 3) | - +------------------------------------------------------------------------------------------------------------------+ - |[[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603},| - |{"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | - |{"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | - |{"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | - |{"cellID": "593472602366803967", "measure": 0.3963963963963964}, | - |{"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | - |{"cellID": "592336738135834623", "measure": 1}, ....]] | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_rastertogridmin(path, 3) | + +--------------------------------------------------------------------------------------------------------------------+ + | [[{"cellID": "593176490141548543", "measure": 0}, {"cellID": "593386771740360703", "measure": 1.2037735849056603}, | + | {"cellID": "593308294097928191", "measure": 0}, {"cellID": "593825202001936383", "measure": 0}, | + | {"cellID": "593163914477305855", "measure": 2}, {"cellID": "592998781574709247", "measure": 1.1283185840707965}, | + | {"cellID": "593262526926422015", "measure": 2}, {"cellID": "592370479398911999", "measure": 0}, | + | {"cellID": "593472602366803967", "measure": 0.3963963963963964}, | + | {"cellID": "593785619583336447", "measure": 0.6590909090909091}, {"cellID": "591988330388783103", "measure": 1}, | + | {"cellID": "592336738135834623", "measure": 1}, ....]] | + +--------------------------------------------------------------------------------------------------------------------+ .. figure:: ../images/rst_rastertogridavg/h3.png :figclass: doc-figure @@ -926,8 +1619,8 @@ rst_rastertoworldcoord The result is a WKT point geometry. The coordinates are computed using the GeoTransform of the raster to respect the projection. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param x: x coordinate of the pixel. :type col: Column (IntegerType) :param y: y coordinate of the pixel. @@ -939,7 +1632,7 @@ rst_rastertoworldcoord .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_rastertoworldcoord('path', F.lit(3), F.lit(3)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -951,7 +1644,7 @@ rst_rastertoworldcoord .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_rastertoworldcoord(col("path"), lit(3), lit(3)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -963,8 +1656,8 @@ rst_rastertoworldcoord .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_rastertoworldcoord(path, 3, 3) +------------------------------------------------------------------------------------------------------------------+ | rst_rastertoworldcoord(path, 3, 3) | @@ -980,8 +1673,8 @@ rst_rastertoworldcoordx Computes the world coordinates of the raster pixel at the given x and y coordinates. The result is the X coordinate of the point after applying the GeoTransform of the raster. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param x: x coordinate of the pixel. :type col: Column (IntegerType) :param y: y coordinate of the pixel. @@ -993,7 +1686,7 @@ rst_rastertoworldcoordx .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_rastertoworldcoordx('path', F.lit(3), F.lit(3)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1005,7 +1698,7 @@ rst_rastertoworldcoordx .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_rastertoworldcoordx(col("path"), lit(3), lit(3)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1017,8 +1710,8 @@ rst_rastertoworldcoordx .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_rastertoworldcoordx(path, 3, 3) +------------------------------------------------------------------------------------------------------------------+ | rst_rastertoworldcoordx(path, 3, 3) | @@ -1034,8 +1727,8 @@ rst_rastertoworldcoordy Computes the world coordinates of the raster pixel at the given x and y coordinates. The result is the X coordinate of the point after applying the GeoTransform of the raster. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param x: x coordinate of the pixel. :type col: Column (IntegerType) :param y: y coordinate of the pixel. @@ -1047,7 +1740,7 @@ rst_rastertoworldcoordy .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_rastertoworldcoordy('path', F.lit(3), F.lit(3)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1059,7 +1752,7 @@ rst_rastertoworldcoordy .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_rastertoworldcoordy(col("path"), lit(3), lit(3)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1071,8 +1764,8 @@ rst_rastertoworldcoordy .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_rastertoworldcoordy(path, 3, 3) +------------------------------------------------------------------------------------------------------------------+ | rst_rastertoworldcoordy(path, 3, 3) | @@ -1090,8 +1783,8 @@ rst_retile The results are the paths to the new rasters. The result set is automatically exploded. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :param width: The width of the tiles. :type col: Column (IntegerType) :param height: The height of the tiles. @@ -1103,40 +1796,40 @@ rst_retile .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_retile('path', F.lit(300), F.lit(300)).show() +------------------------------------------------------------------------------------------------------------------+ | rst_retile(path, 300, 300) | +------------------------------------------------------------------------------------------------------------------+ - | /dbfs/tmp/mosaic/raster/checkpoint/raster_1095576780709022500.tif | - | /dbfs/tmp/mosaic/raster/checkpoint/raster_-1042125519107460588.tif | + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +------------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_retile(col("path"), lit(300), lit(300)).show() +------------------------------------------------------------------------------------------------------------------+ | rst_retile(path, 300, 300) | +------------------------------------------------------------------------------------------------------------------+ - | /dbfs/tmp/mosaic/raster/checkpoint/raster_1095576780709022500.tif | - | /dbfs/tmp/mosaic/raster/checkpoint/raster_-1042125519107460588.tif | + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +------------------------------------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_retile(path, 300, 300) +------------------------------------------------------------------------------------------------------------------+ | rst_retile(path, 300, 300) | +------------------------------------------------------------------------------------------------------------------+ - | /dbfs/tmp/mosaic/raster/checkpoint/raster_1095576780709022500.tif | - | /dbfs/tmp/mosaic/raster/checkpoint/raster_-1042125519107460588.tif | + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +------------------------------------------------------------------------------------------------------------------+ rst_rotation @@ -1148,8 +1841,8 @@ rst_rotation The rotation is the angle between the X axis and the North axis. The rotation is computed using the GeoTransform of the raster. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -1157,7 +1850,7 @@ rst_rotation .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_rotation('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1170,7 +1863,7 @@ rst_rotation .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_rotation(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1183,8 +1876,8 @@ rst_rotation .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_rotation(path) +------------------------------------------------------------------------------------------------------------------+ | rst_rotation(path) | @@ -1200,8 +1893,8 @@ rst_scalex Computes the scale of the raster in the X direction. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -1209,7 +1902,7 @@ rst_scalex .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_scalex('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1221,7 +1914,7 @@ rst_scalex .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_scalex(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1233,8 +1926,8 @@ rst_scalex .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_scalex(path) +------------------------------------------------------------------------------------------------------------------+ | rst_scalex(path) | @@ -1249,8 +1942,8 @@ rst_scaley Computes the scale of the raster in the Y direction. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -1258,7 +1951,7 @@ rst_scaley .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_scaley('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1270,7 +1963,7 @@ rst_scaley .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_scaley(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1282,8 +1975,8 @@ rst_scaley .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_scaley(path) +------------------------------------------------------------------------------------------------------------------+ | rst_scaley(path) | @@ -1291,6 +1984,64 @@ rst_scaley | 1.2 | +------------------------------------------------------------------------------------------------------------------+ +rst_setnodata +********************** + +.. function:: rst_setnodata(raster, nodata) + + Sets the nodata value of the raster. + The result is a new raster with the nodata value set. + The same nodata value is set for all bands of the raster if a single value is passed. + If an array of values is passed, the nodata value is set for each band of the raster. + + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) + :param nodata: The nodata value to set. + :type col: Column (DoubleType) / ArrayType(DoubleType) + :rtype: Column: StringType + + :example: + +.. tabs:: + + .. code-tab:: py + + df = spark.read.format("binaryFile").option("extensions", "tif")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + df.select(mos.rst_setnodata('path', F.lit(0)).show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_setnodata(path, 0) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile").option("extensions", "tif") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + df.select(rst_setnodata(col("path"), lit(0)).show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_setnodata(path, 0) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_tif + USING gdal + OPTIONS (extensions "tif", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + SELECT rst_setnodata(path, 0) + +------------------------------------------------------------------------------------------------------------------+ + | rst_setnodata(path, 0) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + rst_skewx ********************** @@ -1298,8 +2049,8 @@ rst_skewx Computes the skew of the raster in the X direction. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -1307,7 +2058,7 @@ rst_skewx .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_skewx('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1319,7 +2070,7 @@ rst_skewx .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_skewx(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1331,8 +2082,8 @@ rst_skewx .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_skewx(path) +------------------------------------------------------------------------------------------------------------------+ | rst_skewx(path) | @@ -1347,8 +2098,8 @@ rst_skewy Computes the skew of the raster in the Y direction. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -1356,7 +2107,7 @@ rst_skewy .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_skewy('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1368,7 +2119,7 @@ rst_skewy .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_skewy(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1380,8 +2131,8 @@ rst_skewy .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_skewy(path) +------------------------------------------------------------------------------------------------------------------+ | rst_skewy(path) | @@ -1397,8 +2148,10 @@ rst_srid Computes the SRID of the raster. The SRID is the EPSG code of the raster. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + .. note:: For complex CRS definition the EPSG code may default to 0. + + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -1406,7 +2159,7 @@ rst_srid .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_srid('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1418,7 +2171,7 @@ rst_srid .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_srid(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1430,8 +2183,8 @@ rst_srid .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_srid(path) +------------------------------------------------------------------------------------------------------------------+ | rst_srid(path) | @@ -1448,8 +2201,8 @@ rst_subdatasets The subdatasets are the paths to the subdatasets of the raster. The result is a map of the subdataset path to the subdatasets and the description of the subdatasets. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :rtype: Column: MapType(StringType, StringType) :example: @@ -1457,46 +2210,105 @@ rst_subdatasets .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_subdatasets('path').show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_subdatasets(path) | - +------------------------------------------------------------------------------------------------------------------+ - | {"NETCDF:\"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_2022010| - |6-1.nc\":bleaching_alert_area": "[1x3600x7200] N/A (8-bit unsigned integer)", "NETCDF:\"/dbfs/FileStore/geospatial| - |/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc\":mask": "[1x3600x7200] mask (8| - |-bit unsigned integer)"} | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_subdatasets(path) | + +--------------------------------------------------------------------------------------------------------------------+ + | {"NETCDF:\"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_2022010 | + | 6-1.nc\":bleaching_alert_area": "[1x3600x7200] N/A (8-bit unsigned integer)", "NETCDF:\"/dbfs/FileStore/geospatial | + | /mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc\":mask": "[1x3600x7200] mask (8 | + | -bit unsigned integer)"} | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_subdatasets(col("path")).show() - +------------------------------------------------------------------------------------------------------------------+ - | rst_subdatasets(path) | - +------------------------------------------------------------------------------------------------------------------+ - | {"NETCDF:\"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_2022010| - |6-1.nc\":bleaching_alert_area": "[1x3600x7200] N/A (8-bit unsigned integer)", "NETCDF:\"/dbfs/FileStore/geospatial| - |/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc\":mask": "[1x3600x7200] mask (8| - |-bit unsigned integer)"} | - +------------------------------------------------------------------------------------------------------------------+ + +--------------------------------------------------------------------------------------------------------------------+ + | rst_subdatasets(path) | + +--------------------------------------------------------------------------------------------------------------------+ + | {"NETCDF:\"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_2022010 | + | 6-1.nc\":bleaching_alert_area": "[1x3600x7200] N/A (8-bit unsigned integer)", "NETCDF:\"/dbfs/FileStore/geospatial | + | /mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc\":mask": "[1x3600x7200] mask (8 | + | -bit unsigned integer)"} | + +--------------------------------------------------------------------------------------------------------------------+ .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_subdatasets(path) + +--------------------------------------------------------------------------------------------------------------------+ + | rst_subdatasets(path) | + +--------------------------------------------------------------------------------------------------------------------+ + | {"NETCDF:\"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_2022010 | + | 6-1.nc\":bleaching_alert_area": "[1x3600x7200] N/A (8-bit unsigned integer)", "NETCDF:\"/dbfs/FileStore/geospatial | + | /mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc\":mask": "[1x3600x7200] mask (8 | + | -bit unsigned integer)"} | + +--------------------------------------------------------------------------------------------------------------------+ + +rst_subdivide +********************** + +.. function:: rst_subdivide(raster, sizeInMB) + + Subdivides the raster to the given tile size in MB. The result is a collection of new raster files. + The tiles are split until the expected size of a tile is < sizeInMB. + The tile is always split in 4 tiles. This ensures that the tiles are always split in the same way. + The aspect ratio of the tiles is preserved. + The result set is automatically exploded. + + .. note:: The size of the tiles is approximate. Due to compressions and other effects we cannot guarantee the size of the tiles in MB. + + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) + :param sizeInMB: The size of the tiles in MB. + + :example: + +.. tabs:: + + .. code-tab:: py + + df = spark.read.format("binaryFile").option("extensions", "tif")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + df.select(mos.rst_subdivide('path', F.lit(10)).show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_subdivide(path, 10) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile").option("extensions", "tif") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + df.select(rst_subdivide(col("path"), lit(10)).show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_subdivide(path, 10) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_tif + USING gdal + OPTIONS (extensions "tif", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + SELECT rst_subdivide(path, 10) +------------------------------------------------------------------------------------------------------------------+ - | rst_subdatasets(path) | + | rst_subdivide(path, 10) | +------------------------------------------------------------------------------------------------------------------+ - | {"NETCDF:\"/dbfs/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_2022010| - |6-1.nc\":bleaching_alert_area": "[1x3600x7200] N/A (8-bit unsigned integer)", "NETCDF:\"/dbfs/FileStore/geospatial| - |/mosaic/sample_raster_data/binary/netcdf-coral/ct5km_baa_max_7d_v3_1_20220106-1.nc\":mask": "[1x3600x7200] mask (8| - |-bit unsigned integer)"} | + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | +------------------------------------------------------------------------------------------------------------------+ rst_summary @@ -1509,8 +2321,8 @@ rst_summary The logic is produced by gdalinfo procedure. The result is stored as JSON. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :rtype: Column: MapType(StringType, StringType) :example: @@ -1518,7 +2330,7 @@ rst_summary .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_summary('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1534,7 +2346,7 @@ rst_summary .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_summary(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1550,8 +2362,8 @@ rst_summary .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_summary(path) +------------------------------------------------------------------------------------------------------------------+ | rst_summary(path) | @@ -1563,6 +2375,173 @@ rst_summary |Watch Program", "NC_GLOBAL#cdm_data_type":"Gr... | +------------------------------------------------------------------------------------------------------------------+ +rst_tessellate +********************** + +.. function:: rst_tessellate(raster, resolution) + + Tessellates the raster to the given resolution of the supported grid (H3, BNG, Custom). The result is a collection of new raster files. + Each tile in the tile set corresponds to a cell that is a part of the tesselation of the bounding box of the raster. + The result set is automatically exploded. + If rst_merge is called on the tile set the original raster will be reconstructed. + The output tiles have same number of bands as the input rasters. + + :param tile: A column containing the raster tile. + :type col: Column (RasterTileType) + :param sizeInMB: The size of the tiles in MB. + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("binaryFile").option("extensions", "tif")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + df.select(mos.rst_tessellate('path', F.lit(10)).show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_tessellate(path, 10) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile").option("extensions", "tif") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + df.select(rst_tessellate(col("path"), lit(10)).show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_tessellate(path, 10) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_tif + USING gdal + OPTIONS (extensions "tif", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + SELECT rst_tessellate(path, 10) + +------------------------------------------------------------------------------------------------------------------+ + | rst_tessellate(path, 10) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + +rst_tooverlappingtiles +********************** + +.. function:: rst_tooverlappingtiles(raster, width, height, overlap) + + Splits the raster into overlapping tiles of the given width and height. + The overlap is the the percentage of the tile size that the tiles overlap. + The result is a collection of new raster files. + The result set is automatically exploded. + If rst_merge is called on the tile set the original raster will be reconstructed. + The output tiles have same number of bands as the input rasters. + + :param tile: A column containing the raster tile. + :type col: Column (RasterTileType) + :param width: The width of the tiles in pixels. + :type col: Column (IntegerType) + :param height: The height of the tiles in pixels. + :type col: Column (IntegerType) + :param overlap: The overlap of the tiles in percentage. + :type col: Column (IntegerType) + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("binaryFile").option("extensions", "tif")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + df.select(mos.rst_tooverlappingtiles('path', F.lit(10), F.lit(10), F.lit(10)).show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_tooverlappingtiles(path, 10, 10, 10) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile").option("extensions", "tif") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif + df.select(rst_tooverlappingtiles(col("path"), lit(10), lit(10), lit(10)).show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_tooverlappingtiles(path, 10, 10, 10) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_tif + USING gdal + OPTIONS (extensions "tif", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + SELECT rst_tooverlappingtiles(path, 10, 10, 10) + +------------------------------------------------------------------------------------------------------------------+ + | rst_tooverlappingtiles(path, 10, 10, 10) | + +------------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + | {index_id: 593308294097928192, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "GTiff" } | + +------------------------------------------------------------------------------------------------------------------+ + +rst_tryopen +********************** + +.. function:: rst_tryopen(raster) + + Tries to open the raster. If the raster cannot be opened the result is false and if the raster can be opened the result is true. + + :param tile: A column containing the raster tile. + :type col: Column (RasterTileType) + :rtype: Column: BooleanType + + :example: + +.. tabs:: + .. code-tab:: py + + df = spark.read.format("binaryFile").option("extensions", "tif")\ + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + df.select(mos.rst_tryopen('path').show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_tryopen(path) | + +------------------------------------------------------------------------------------------------------------------+ + | true | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + val df = spark.read + .format("binaryFile").option("extensions", "tif") + .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif + df.select(rst_tryopen(col("path")).show() + +------------------------------------------------------------------------------------------------------------------+ + | rst_tryopen(path) | + +------------------------------------------------------------------------------------------------------------------+ + | true | + +------------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + CREATE TABLE IF NOT EXISTS TABLE coral_tif + USING gdal + OPTIONS (extensions "tif", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/tif") + SELECT rst_tryopen(path) + +------------------------------------------------------------------------------------------------------------------+ + | rst_tryopen(path) | + +------------------------------------------------------------------------------------------------------------------+ + | true | + +------------------------------------------------------------------------------------------------------------------+ + rst_upperleftx ********************** @@ -1571,8 +2550,8 @@ rst_upperleftx Computes the upper left X coordinate of the raster. The value is computed based on GeoTransform. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -1580,7 +2559,7 @@ rst_upperleftx .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_upperleftx('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1592,7 +2571,7 @@ rst_upperleftx .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_upperleftx(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1604,8 +2583,8 @@ rst_upperleftx .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_upperleftx(path) +------------------------------------------------------------------------------------------------------------------+ | rst_upperleftx(path) | @@ -1621,8 +2600,8 @@ rst_upperlefty Computes the upper left Y coordinate of the raster. The value is computed based on GeoTransform. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :rtype: Column: DoubleType :example: @@ -1630,7 +2609,7 @@ rst_upperlefty .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_upperlefty('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1642,7 +2621,7 @@ rst_upperlefty .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_upperlefty(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1654,8 +2633,8 @@ rst_upperlefty .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_upperlefty(path) +------------------------------------------------------------------------------------------------------------------+ | rst_upperlefty(path) | @@ -1671,8 +2650,8 @@ rst_width Computes the width of the raster in pixels. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :rtype: Column: IntegerType :example: @@ -1680,7 +2659,7 @@ rst_width .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_width('path').show() +------------------------------------------------------------------------------------------------------------------+ @@ -1692,7 +2671,7 @@ rst_width .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_width(col("path")).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1704,8 +2683,8 @@ rst_width .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_width(path) +------------------------------------------------------------------------------------------------------------------+ | rst_width(path) | @@ -1723,8 +2702,8 @@ rst_worldtorastercoord The world coordinates are the coordinates in the CRS of the raster. The coordinates are resolved using GeoTransform. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :param x: X world coordinate. :type col: Column (StringType) :param y: Y world coordinate. @@ -1736,7 +2715,7 @@ rst_worldtorastercoord .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_worldtorastercoord('path', F.lit(-160.1), F.lit(40.0)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1748,7 +2727,7 @@ rst_worldtorastercoord .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_worldtorastercoord(col("path"), lit(-160.1), lit(40.0)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1760,8 +2739,8 @@ rst_worldtorastercoord .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_worldtorastercoord(path, -160.1, 40.0) +------------------------------------------------------------------------------------------------------------------+ | rst_worldtorastercoord(path) | @@ -1781,8 +2760,8 @@ rst_worldtorastercoordx This method returns the X coordinate. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :param x: X world coordinate. :type col: Column (StringType) :param y: Y world coordinate. @@ -1794,7 +2773,7 @@ rst_worldtorastercoordx .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_worldtorastercoord('path', F.lit(-160.1), F.lit(40.0)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1806,7 +2785,7 @@ rst_worldtorastercoordx .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_worldtorastercoordx(col("path"), lit(-160.1), lit(40.0)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1818,8 +2797,8 @@ rst_worldtorastercoordx .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_worldtorastercoordx(path, -160.1, 40.0) +------------------------------------------------------------------------------------------------------------------+ | rst_worldtorastercoordx(path, -160.1, 40.0) | @@ -1839,8 +2818,8 @@ rst_worldtorastercoordy This method returns the Y coordinate. - :param raster: A column containing the path to a raster file. - :type col: Column (StringType) + :param tile: A column containing the raster tile. For < 0.3.11 string representing the path to a raster file or byte array.A column containing the path to a raster file. + :type col: Column (RasterTileType) :param x: X world coordinate. :type col: Column (StringType) :param y: Y world coordinate. @@ -1852,7 +2831,7 @@ rst_worldtorastercoordy .. tabs:: .. code-tab:: py - df = spark.read.format("binaryFile").option("pathGlobFilter", "*.nc")\ + df = spark.read.format("binaryFile").option("extensions", "nc")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(mos.rst_worldtorastercoordy('path', F.lit(-160.1), F.lit(40.0)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1864,7 +2843,7 @@ rst_worldtorastercoordy .. code-tab:: scala val df = spark.read - .format("binaryFile").option("pathGlobFilter", "*.nc") + .format("binaryFile").option("extensions", "nc") .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") df.select(rst_worldtorastercoordy(col("path"), lit(-160.1), lit(40.0)).show() +------------------------------------------------------------------------------------------------------------------+ @@ -1876,8 +2855,8 @@ rst_worldtorastercoordy .. code-tab:: sql CREATE TABLE IF NOT EXISTS TABLE coral_netcdf - USING binaryFile - OPTIONS (pathGlobFilter "*.nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") + USING gdal + OPTIONS (extensions "nc", path "dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral") SELECT rst_worldtorastercoordy(path, -160.1, 40.0) +------------------------------------------------------------------------------------------------------------------+ | rst_worldtorastercoordy(path, -160.1, 40.0) | diff --git a/docs/source/usage/install-gdal.rst b/docs/source/usage/install-gdal.rst index f1735d3b0..b3a583051 100644 --- a/docs/source/usage/install-gdal.rst +++ b/docs/source/usage/install-gdal.rst @@ -24,6 +24,8 @@ Setup GDAL files and scripts Mosaic requires GDAL to be installed on the cluster. The easiest way to do this is to use the the mos.setup_gdal() function. This function will extract the GDAL files and scripts from the mosaic library and place them in the /dbfs/FileStore/geospatial/mosaic/gdal/ directory. +This call is no longer needed in versions >= 0.3.12. The shared objects are now included in the +databricks-mosaic-gdal pip installable bundle. .. code-block:: py @@ -39,7 +41,9 @@ mosaic library and place them in the /dbfs/FileStore/geospatial/mosaic/gdal/ dir Configure the init script ************************** After the mos.setup_gdal() function has been run, you will need to configure the cluster to use the -init script. This can be done by clicking on the "Edit" button on the cluster page and adding +init script. For versions >= 0.3.12, we are required to use the following init script: +`here `__. +The init script can be set by clicking on the "Edit" button on the cluster page and adding the following to the "Advanced Options" section: .. figure:: ../images/init_script.png diff --git a/src/main/scala/com/databricks/labs/mosaic/core/raster/api/GDAL.scala b/src/main/scala/com/databricks/labs/mosaic/core/raster/api/GDAL.scala index cdfaa76d2..cfb3e4fd0 100644 --- a/src/main/scala/com/databricks/labs/mosaic/core/raster/api/GDAL.scala +++ b/src/main/scala/com/databricks/labs/mosaic/core/raster/api/GDAL.scala @@ -34,7 +34,7 @@ object GDAL { // https://www.tutorialspoint.com/scala/scala_data_types.htm case GDT_UInt16 => Char.MaxValue.toDouble case GDT_Int16 => Short.MinValue.toDouble - case GDT_UInt32 => 2 * Int.MinValue.toDouble + case GDT_UInt32 => 2 * Int.MaxValue.toDouble case GDT_Int32 => Int.MinValue.toDouble case GDT_Float32 => Float.MinValue.toDouble case GDT_Float64 => Double.MinValue