Skip to content

Commit

Permalink
changelog, docker, and pyspark version changes.
Browse files Browse the repository at this point in the history
  • Loading branch information
mjohns-databricks committed May 24, 2024
1 parent ef3ee9a commit 2e3f3bd
Show file tree
Hide file tree
Showing 21 changed files with 221 additions and 79 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build_main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_scala.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pypi-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
python: [ 3.10.12 ]
numpy: [ 1.22.4 ]
gdal: [ 3.4.1 ]
spark: [ 3.4.0 ]
spark: [ 3.4.1 ]
R: [ 4.2.2 ]
steps:
- name: checkout code
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -164,3 +164,4 @@ docker/.m2/
/python/mosaic_test/
/python/checkpoint/
/python/checkpoint-new/
/scripts/docker/docker-build/ubuntu-22-spark-3.4/Dockerfile
5 changes: 2 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
## v0.4.3 [DBR 13.3 LTS]
- Pyspark requirement removed from python setup.cfg as it is supplied by DBR
- iPython dependency limited to "<8.11,>=7.4.2" for both DBR and keplergl-jupyter
- Python version limited to "<3.11,>=3.10" for DBR
- Fixes 'raster_to_grid' reader tessellation issue affecting some NetCDFs; also adding repartitioning for better performance.
- Pyspark version limited to 3.4.1 for DBR
- iPython dependency limited to "<8.11,>=7.4.2" for both DBR and keplergl-jupyter
- Expanded support for fuse-based checkpointing (persisted raster storage), managed through:
- spark config 'spark.databricks.labs.mosaic.raster.use.checkpoint' in addition to 'spark.databricks.labs.mosaic.raster.checkpoint'.
- python: `mos.enable_gdal(spark, with_checkpoint_path=path)`.
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ The repository is structured as follows:
## Test & build Mosaic

Given that DBR 13.3 is Ubuntu 22.04, we recommend using docker,
see [mosaic-docker.sh](https://github.com/databrickslabs/mosaic/blob/main/scripts/mosaic-docker.sh).
see [mosaic-docker.sh](https://github.com/databrickslabs/mosaic/blob/main/scripts/docker/mosaic-docker.sh).

### Scala JAR

Expand Down
1 change: 0 additions & 1 deletion docs/source/api/rasterio-udfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,6 @@ depending on your needs.
def write_raster(raster, driver, file_id, fuse_dir):
from io import BytesIO
from pathlib import Path
from pyspark.sql.functions import udf
from rasterio.io import MemoryFile
import numpy as np
import rasterio
Expand Down
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@
<properties>
<scala.version>2.12.10</scala.version>
<scala.compat.version>2.12</scala.compat.version>
<spark.version>3.4.0</spark.version>
<spark.version>3.4.1</spark.version>
<mosaic.version>0.4.3</mosaic.version>
</properties>
<build>
Expand Down Expand Up @@ -291,7 +291,7 @@
<properties>
<scala.version>2.12.10</scala.version>
<scala.compat.version>2.12</scala.compat.version>
<spark.version>3.4.0</spark.version>
<spark.version>3.4.1</spark.version>
<mosaic.version>0.4.3</mosaic.version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
Expand Down
1 change: 1 addition & 0 deletions python/setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ install_requires =
h3<4.0,>=3.7
ipython<8.11,>=7.4.2
keplergl==0.3.2
pyspark==3.4.1

[options.package_data]
mosaic =
Expand Down
28 changes: 28 additions & 0 deletions scripts/docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Docker Build

> This is adapted from [Mosaic-Docker](https://github.com/r3stl355/mosaic-docker) repo, focused on DBR 13.3 LTS which is Ubuntu 22.04.
> It is needed when you want to build and run tests on non Ubuntu Jammy machines, e.g. MacOS.
## Steps

1. Cmd `GDAL_VERSION=3.4.1 LIBPROJ_VERSION=7.1.0 SPARK_VERSION=3.4.1 CORES=4 ./build`
builds the docker image for DBR 13.3 LTS. Name will be 'mosaic-dev:ubuntu22-gdal3.4.1-spark3.4.1'.
2. Cmd `sh scripts/docker/mosaic-docker.sh` to run. That script launches a container and further (optionally) configures.

## Additional Notes

* Image is configured to JDK 8 to match DBR 13; python 3.10 as well
* Support IDE driven or Jupyter notebook testing in addition to straight shell,
see more at [Mosaic-Docker](https://github.com/r3stl355/mosaic-docker). Recommend placing any test notebooks
in '<project_root>/python/notebooks' which is already added to .gitignore
* If you want to run tests within a container shell:
- `unset JAVA_TOOL_OPTIONS` is needed to execute JVM tests
- then can test e.g. `mvn -X test -DskipTests=false -Dsuites=com.databricks.labs.mosaic.core.raster.TestRasterGDAL`
and `python3 -m unittest mosaic test/test_fuse_install.py` from ./python dir
- you may need to run `mvn clean` occasionally, especially around initial setup as intellij is JDK 11 (pom.xml)
and docker is JDK 8
- you don't need to specify -PskipCoverage (see 'm2/settings.xml' and pom.xml)
* Get shell with `docker exec -it mosaic-dev /bin/bash -c "unset JAVA_TOOL_OPTIONS && cd /root/mosaic && /bin/bash"`,
can have multiple shells going; call `sh scripts/docker/exec-shell.sh` also
* `docker stop mosaic-dev` whenever done to terminate the container
* NOTE: Ignore 'ERRO[0000] error waiting for container: context canceled' if you get this on MacOS
115 changes: 115 additions & 0 deletions scripts/docker/docker-build/ubuntu-22-spark-3.4/Dockerfile.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
FROM --platform=linux/amd64 ubuntu:22.04

# refresh package info
RUN apt-get update -y

# Install OpenJDK 8
RUN apt-get install -y openjdk-8-jdk --no-install-recommends

# Install native dependencies
RUN apt-get install -y python3-numpy unixodbc libcurl3-gnutls libsnappy-dev libopenjp2-7

ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64

# Install dependencies
RUN set -ex \
&& deps=" \
python3-dev \
python3-numpy \
python3-pip \
python3-venv \
bash-completion \
libspatialite-dev \
libpq-dev \
libcurl4-gnutls-dev \
libxml2-dev \
libgeos-dev \
libnetcdf-dev \
libpoppler-dev \
libhdf4-alt-dev \
libhdf5-serial-dev \
libpoppler-private-dev \
sqlite3 \
libsqlite3-dev \
libtiff-dev \
wget \
curl \
" \
&& buildDeps=" \
build-essential \
cmake \
swig \
ant \
pkg-config \
"\
&& apt-get update -y && apt-get install -y $buildDeps $deps --no-install-recommends

# Install the remaining components
ENV ROOTDIR /usr/local
ENV LD_LIBRARY_PATH /usr/local/lib
ENV SPARK_VERSION %%SPARK_VERSION%%
ENV GDAL_VERSION %%GDAL_VERSION%%
ENV LIBPROJ_VERSION %%LIBPROJ_VERSION%%
ENV CORES %%CORES%%

WORKDIR $ROOTDIR/
RUN mkdir -p $ROOTDIR/src

# Install PROJ
RUN wget -qO- https://download.osgeo.org/proj/proj-${LIBPROJ_VERSION}.tar.gz | \
tar -xzC $ROOTDIR/src/

RUN cd src/proj-${LIBPROJ_VERSION} && ./configure && make -j${CORES} && make install \
&& cd $ROOTDIR && rm -Rf src/proj*

# Install GDAL
RUN wget -qO- https://download.osgeo.org/gdal/${GDAL_VERSION}/gdal-${GDAL_VERSION}.tar.gz | \
tar -xzC $ROOTDIR/src/

RUN cd src/gdal-${GDAL_VERSION} \
&& ./configure --with-java=$JAVA_HOME \
&& make -j${CORES} && make -j${CORES} install && ldconfig

# Install Java bindings for GDAL
RUN cd $ROOTDIR/src/gdal-${GDAL_VERSION}/swig/java && make -j${CORES} && make -j${CORES} install

# Copy binaries to the location expected to be by Mosaic
RUN ln -s $ROOTDIR/lib/libgdal.so /usr/lib/libgdal.so
RUN ln -s $ROOTDIR/lib/libgdal.so.30 /usr/lib/libgdal.so.30
RUN ln -s $ROOTDIR/lib/libgdal.so.30.0.3 /usr/lib/libgdal.so.30.0.3
RUN mkdir -p /usr/lib/jni && ln -s $ROOTDIR/lib/libgdalalljni.so /usr/lib/jni/libgdalalljni.so.30
RUN mkdir -p /usr/lib/ogdi && ln -s $ROOTDIR/lib/libgdal.so /usr/lib/ogdi/libgdal.so

# Add Maven
ARG MAVEN_VERSION=3.9.6
ARG USER_HOME_DIR="/root"
ARG BASE_URL=https://dlcdn.apache.org/maven/maven-3/${MAVEN_VERSION}/binaries
ARG ARG SHA=706f01b20dec0305a822ab614d51f32b07ee11d0218175e55450242e49d2156386483b506b3a4e8a03ac8611bae96395fd5eec15f50d3013d5deed6d1ee18224

RUN mkdir -p $ROOTDIR/share/maven $ROOTDIR/share/maven/ref \
&& echo "Downlaoding maven" \
&& curl -fsSL -o /tmp/apache-maven.tar.gz ${BASE_URL}/apache-maven-${MAVEN_VERSION}-bin.tar.gz \
\
&& echo "Checking download hash" \
&& echo "${SHA} /tmp/apache-maven.tar.gz" | sha512sum -c - \
\
&& echo "Unziping maven" \
&& tar -xzf /tmp/apache-maven.tar.gz -C $ROOTDIR/share/maven --strip-components=1 \
\
&& echo "Cleaning and setting links" \
&& rm -f /tmp/apache-maven.tar.gz \
&& ln -s $ROOTDIR/share/maven/bin/mvn $ROOTDIR/bin/mvn

ENV MAVEN_HOME $ROOTDIR/share/maven
ENV MAVEN_CONFIG "$USER_HOME_DIR/.m2"

# Python packages
# - Adds additional needed packages
RUN pip3 install pip --upgrade
RUN pip3 install build wheel keplergl ipython pyspark==$SPARK_VERSION
RUN pip3 install black build isort py4j requests
RUN pip3 install gdal==$GDAL_VERSION

# Clean up
RUN apt-get purge -y --auto-remove $buildDeps \
&& rm -rf /var/lib/apt/lists/*
12 changes: 12 additions & 0 deletions scripts/docker/docker-build/ubuntu-22-spark-3.4/build
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash

set -e

sed -e "s/%%GDAL_VERSION%%/$GDAL_VERSION/" \
-e "s/%%LIBPROJ_VERSION%%/$LIBPROJ_VERSION/" \
-e "s/%%SPARK_VERSION%%/$SPARK_VERSION/" \
-e "s/%%CORES%%/$CORES/" "Dockerfile.template" > Dockerfile

# use --no-cache to force clean build
#docker build --no-cache -t "mosaic-dev:ubuntu22-gdal$GDAL_VERSION-spark$SPARK_VERSION" .
docker build -t "mosaic-dev:ubuntu22-gdal$GDAL_VERSION-spark$SPARK_VERSION" .
27 changes: 27 additions & 0 deletions scripts/docker/docker_init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

# [1] unset variable for this script
echo "\n::: [1] ... unsetting JAVA_TOOL_OPTIONS (probably need to do in container as well) :::"
unset JAVA_TOOL_OPTIONS

# [2] copy custom settings.xml
# - defaults to new skipScoverage profile
# - compliments the pom config (profile sCoverage also added there)
# - sets .m2 folder to be in project
echo "\n::: [2] ... setting up new .m2 (in project) + new skipScoverage profile (as default) :::"
mv /usr/local/share/maven/conf/settings.xml /usr/local/share/maven/conf/settings.xml.BAK
cp /root/mosaic/scripts/docker/m2/settings.xml /usr/local/share/maven/conf
echo " ... mvn active profile(s)\n"
cd /root/mosaic && mvn help:active-profiles

# [3] build JVM code
# this is building for container JDK
# see settings.xml for overrides
echo "\n::: [3] ... maven package - JVM code version? :::\n"
echo " $(javac -version)"
cd /root/mosaic && mvn package -DskipTests

# [4] build python
# - refer to dockerfile for what is already built
echo "\n::: [4] ... build python :::\n"
cd /root/mosaic/python && pip install .
3 changes: 3 additions & 0 deletions scripts/docker/exec-shell.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

docker exec -it mosaic-dev /bin/bash -c "unset JAVA_TOOL_OPTIONS && cd /root/mosaic && /bin/bash"
2 changes: 1 addition & 1 deletion scripts/m2/settings.xml → scripts/docker/m2/settings.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
|
| Default: ${user.home}/.m2/repository
-->
<localRepository>/root/mosaic/scripts/m2</localRepository>
<localRepository>/root/mosaic/scripts/docker/m2</localRepository>
<activeProfiles>
<activeProfile>skipScoverage</activeProfile>
</activeProfiles>
Expand Down
23 changes: 23 additions & 0 deletions scripts/docker/mosaic-docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash

# [1] Build the image under 'docker-build':
# `GDAL_VERSION=3.4.1 LIBPROJ_VERSION=7.1.0 SPARK_VERSION=3.4.1 CORES=4 ./build`
# - produces image 'ubuntu22-gdal3.4.1-spark3.4.1' [default is JDK 8]
# [2] run this in root of (mosaic repo), e.g. `sh scripts/docker/mosaic-docker.sh`
# - for IDE driven or Jupyter notebook testing
# [3] if you want to run tests within the container shell
# - [a] `unset JAVA_TOOL_OPTIONS` is needed to execute JVM tests
# - [b] then can test e.g. `mvn -X test -DskipTests=false -Dsuites=com.databricks.labs.mosaic.core.raster.TestRasterGDAL`
# and `python3 -m unittest mosaic test/test_fuse_install.py` from ./python dir
# - [c] you may need to run `mvn clean` occasionally, especially around initial setup as intellij is JDK 11
# and docker is JDK 8.
# ... don't need to specify -PskipCoverage (see settings.xml)
# [4] get shell with `docker exec -it mosaic-dev /bin/bash -c "unset JAVA_TOOL_OPTIONS && cd /root/mosaic && /bin/bash"`,
# - can have multiple shells going; call `sh scripts/docker/exec-shell.sh` also
# [5] `docker stop mosaic-dev` whenever done to terminate the container
# NOTE: Ignore 'ERRO[0000] error waiting for container: context canceled'
docker run -q --privileged --platform linux/amd64 --name mosaic-dev -p 5005:5005 -p 8888:8888 \
-v $PWD:/root/mosaic -e JAVA_TOOL_OPTIONS="-agentlib:jdwp=transport=dt_socket,address=5005,server=y,suspend=n" \
-itd --rm mosaic-dev:ubuntu22-gdal3.4.1-spark3.4.1 /bin/bash
docker exec -it mosaic-dev /bin/bash -c "sh /root/mosaic/scripts/docker/docker_init.sh"
docker exec -it mosaic-dev /bin/bash -c "unset JAVA_TOOL_OPTIONS && cd /root/mosaic && /bin/bash"
48 changes: 0 additions & 48 deletions scripts/m2/mvn_init.sh

This file was deleted.

17 changes: 0 additions & 17 deletions scripts/mosaic-docker.sh

This file was deleted.

Loading

0 comments on commit 2e3f3bd

Please sign in to comment.