We happily welcome contributions to Mosaic. We use GitHub Issues to track community reported issues and GitHub Pull Requests for accepting changes.
The repository is structured as follows:
pom.xml
Mosaic project definition and dependenciessrc/
Scala source code and tests for Mosaicpython/
Source code for Python bindingsdocs/
Source code for documentation.github/workflows
CI definitions for Github Actions
We use the Maven build tool to manage and build the Mosaic scala project.
The Mosaic JAR including all dependencies can be generated by running: mvn clean package
.
By default, this will also run the tests in src/test/
.
The packaged JAR should be available in target/
.
The python bindings can be tested using unittest.
- Build the scala project and copy to the packaged JAR to the
python/mosaic/lib/
directory. - Move to the
python/
directory and install the project and its dependencies:pip install . && pip install pyspark==<project_spark_version>
(where 'project_spark_version' corresponds to the version of Spark used for the target Databricks Runtime, e.g.3.2.1
. - Run the tests using
unittest
:python -m unittest
The project wheel file can be built with build.
- Install the build requirements:
pip install build wheel
. - Build the wheel using
python -m build
. - Collect the .whl file from
python/dist/
The documentation has been produced using Sphinx.
To build the docs:
- Install the pandoc library (follow the instructions for your platform here).
- Install the python requirements from
docs/docs-requirements.txt
. - Build the HTML documentation by running
make html
fromdocs/
. - You can locally host the docs by running the
reload.py
script in thedocs/source/
directory.
Tools we use for code formatting and checking:
scalafmt
andscalastyle
in the main scala project.black
andisort
for the python bindings.