Skip to content

Set up Apache Spark

mrp-78 edited this page Aug 2, 2019 · 1 revision

Prerequisites

Add entries in hosts file (master and slaves)

Edit hosts file with this command :

$ sudo vim /etc/hosts

Now add entries of master and slaves in hosts file.

<MASTER-IP> master
<SLAVE01-IP> slave01
<SLAVE02-IP> slave02

Java 8 must be installed (master and slaves)

Configure SSH (only master)

You can see this link for Configure passwordless SSH.

Install Spark

Note: The whole spark installation procedure must be done in master as well as in all slaves.

Download latest version of Spark

You can download spark from this link.

Extract Spark tar

Use the following command for extracting the spark tar file.

$ tar xvf spark-2.3.0-bin-hadoop2.7.tgz

Move Spark software files

Use the following command to move the spark software files to respective directory (/usr/local/bin)

$ sudo mv spark-2.3.0-bin-hadoop2.7 /usr/local/spark

Set up the environment for Spark

Edit bashrc file.

$ sudo vim ~/.bashrc

Add the following line to ~/.bashrc file. It means adding the location, where the spark software file are located to the PATH variable.

export PATH = $PATH:/usr/local/spark/bin

Use the following command for sourcing the ~/.bashrc file.

$ source ~/.bashrc

Spark Master Configuration

Note: Do the following procedures only in master.

Edit spark-env.sh

Move to spark conf folder and create a copy of template of spark-env.sh and rename it.

$ cd /usr/local/spark/conf
$ cp spark-env.sh.template spark-env.sh

Now edit the configuration file spark-env.sh.

$ sudo vim spark-env.sh

And set the following parameters.

export SPARK_MASTER_HOST='<MASTER-IP>'
export JAVA_HOME=<Path_of_JAVA_installation>

Add Workers

Edit the configuration file slaves in (/usr/local/spark/conf).

$ sudo vim slaves

And add the following entries.

master
slave01
slave02

Start/Stop Spark Cluster

To start the spark cluster, run the following command on master.

$ cd /usr/local/spark
$ ./sbin/start-all.sh

To stop the spark cluster, run the following command on master.

$ cd /usr/local/spark
$ ./sbin/stop-all.sh

Check whether services have been started

To check daemons on master and slaves, use the following command.

$ jps

Spark Web UI

Browse the Spark UI to know about worker nodes, running application, cluster resources.

Spark Master UI

http://<MASTER-IP>:8080/

Spark Application UI

http://<MASTER_IP>:4040/