-
Notifications
You must be signed in to change notification settings - Fork 4
Set up Apache Spark
Edit hosts file with this command :
$ sudo vim /etc/hosts
Now add entries of master and slaves in hosts file.
<MASTER-IP> master
<SLAVE01-IP> slave01
<SLAVE02-IP> slave02
You can see this link for Configure passwordless SSH.
Note: The whole spark installation procedure must be done in master as well as in all slaves.
You can download spark from this link.
Use the following command for extracting the spark tar file.
$ tar xvf spark-2.3.0-bin-hadoop2.7.tgz
Use the following command to move the spark software files to respective directory (/usr/local/bin)
$ sudo mv spark-2.3.0-bin-hadoop2.7 /usr/local/spark
Edit bashrc file.
$ sudo vim ~/.bashrc
Add the following line to ~/.bashrc file. It means adding the location, where the spark software file are located to the PATH variable.
export PATH = $PATH:/usr/local/spark/bin
Use the following command for sourcing the ~/.bashrc file.
$ source ~/.bashrc
Note: Do the following procedures only in master.
Move to spark conf folder and create a copy of template of spark-env.sh and rename it.
$ cd /usr/local/spark/conf
$ cp spark-env.sh.template spark-env.sh
Now edit the configuration file spark-env.sh.
$ sudo vim spark-env.sh
And set the following parameters.
export SPARK_MASTER_HOST='<MASTER-IP>'
export JAVA_HOME=<Path_of_JAVA_installation>
Edit the configuration file slaves in (/usr/local/spark/conf).
$ sudo vim slaves
And add the following entries.
master
slave01
slave02
To start the spark cluster, run the following command on master.
$ cd /usr/local/spark
$ ./sbin/start-all.sh
To stop the spark cluster, run the following command on master.
$ cd /usr/local/spark
$ ./sbin/stop-all.sh
To check daemons on master and slaves, use the following command.
$ jps
Browse the Spark UI to know about worker nodes, running application, cluster resources.
http://<MASTER-IP>:8080/
http://<MASTER_IP>:4040/