The project uses Java 11, and Zulu OpenJDK is a preferred JDK. Users can switch to the specific JDK every time they work with this repository by installing SDKMAN and running the following command in the project root directory:
sdk env
The project uses sbt
v1.7.+ for building and running tests. Users can
download sbt
by following the instructions here.
The plugin is built and tested against Spark v3.3.0 and Hadoop v3.3.+, respectively. The instructions for setting up a Hadoop and Spark installation on a machine with VEs attached can be found on the project website, [here]](https://sparkcyclone.io/docs/spark-sql/getting-started/hadoop-and-spark-installation-guide) and here.
In addition, instructions for configuring a local (custom) installation of Spark with an established Hadoop cluster can be found here.
For Windows, make sure you configure Hadoop as per
Hadoop on Windows
and set the appropriate HADOOP_HOME
(Use winutils as needed)
The files should look like this:
C:/hadoop-3.2.1/bin/hadoop.dll
...
Also add the bin directory to the PATH
.
For cluster-mode/detection tests that run on the VectorEngine
scope, make sure
that $SPARK_HOME/work
is writable:
$ mkdir -p /opt/spark/work && chmod -R 777 /opt/spark/work
Instructions can be found here to lower the latency of SSH connections, which is likely needed in the case of software development involving VEs in a remote server(in general, a 40% decrease latency can be observed).
The sbt
console should be launched with large amount of heap memory available:
SBT_OPTS="-Xmx16g" sbt
To build the plugin, simply run in the sbt
console:
show assembly
The location of the assembled fat JAR will be displayed.
A shortcut is provided in the sbt
console to copy the built plugin JAR to a
pre-determined directory in the filesystem:
// Copy the JAR to /opt/cyclone/${USER}/
deploy
See Testing and CI for more information on how to run Spark Cyclone tests on different levels.