In this article we will see "How to create a Spark Java application" and submit it to spark cluster to be executed. We will create a maven Java application with Spark Java API.
1) Install JavaSpark processes runs in JVM, Java should be pre-installed on the machines on which we have to run Spark job. Make sure each machine has Java8+ installed; if not, Java 8 can be installed easily using below commands:
To check the installation use following command:
$ sudo apt-add-repository ppa:webupd8team/java $ sudo apt-get update $ sudo apt-get install oracle-java8-installer
$ java -version java version "1.8.0_151" Java(TM) SE Runtime Environment (build 1.8.0_151-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
2) Install ScalaApache spark is written in Scala, to build it we would need Scala installed, if not installed already, run following commands:
$ echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list $ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823 $ sudo apt-get update $ sudo apt-get install sbt
3) Install GitApache spark requires Git, if not installed already, run following commands:
$ sudo apt-get install git
4) Download SparkApache spark can be downloaded from official Spark Website with prebuild bundles or in source code form, for this article we will setup the cluster with source code:
$ sudo wget http://d3kbcqa49mib13.cloudfront.net/spark-2.2.0.tgz $ sudo tar xvf spark-2.2.0.tgz
5) Build SparkSpark can be build using SBT(Simple Build Tool) which is bundled with it. Run following commands to build the code:
Building spark will take some time, be patient !
$ sudo build/sbt assembly Using /usr/lib/jvm/java-8-oracle as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. Attempting to fetch sbt Launching sbt from build/sbt-launch-0.13.13.jar Getting org.scala-sbt sbt 0.13.13 ...
6) Starting MasterAssuming that, we were building spark and are done with all the steps from 1 to 5 mentioned above, now we can start spark as master with following command:
This will start a master node with local ip 127.0.0.1 and port 7077, spark comes with an web ui, to check details of spark master following url can be used: http://localhost:8080/
$ sudo ./sbin/start-master.sh -h 127.0.0.1 -p 7077 --webui-port 8080
7) Starting Worker nodesFollow steps 1 to 5 mentioned above on each machine, where you want to start a worker to be supplied to master started in step 6.
Spark master can be supplied by any number of workers, to start a worker following command can be used:
This will start a worker node with local ip 127.0.0.1 and port 7079, "spark://127.0.0.1:7077" is the URL of master node. Parameter -c and -m tells the number of machine cores and RAM to be assigned to the worker.
$ sudo ./sbin/start-slave.sh spark://127.0.0.1:7077 -h 127.0.0.1 -p 7079 --webui-port 8082 -c 2 -m 1024m
A web ui is also availabe to monitor spak workers, this can be accessed on: http://localhost:8082/
8) Test statusNow the attached workers and applications submitted can be viewed on master web ui: http://localhost:8080/
9) Connecting to spark clusterWe can connect to spark master from the application or can access the shell using following command:
A spark shell to interact with this spark cluster will be opened as shown below:
./bin/spark-shell --master spark://127.0.0.1:7077
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/11/11 16:10:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/11/11 16:10:32 WARN Utils: Your hostname, techie-Satellite-Pro-R50-B resolves to a loopback address: 127.0.1.1; using 192.168.0.103 instead (on interface wlan0) 17/11/11 16:10:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address Spark context Web UI available at http://192.168.0.103:4040 Spark context available as 'sc' (master = spark://127.0.0.1:7077, app id = app-20171111161034-0000). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151) Type in expressions to have them evaluated. Type :help for more information. scala>
In this article we have seen how to setup Apache Spark cluster on ubuntu machines using Simple standalone spark cluster manager. In coming articles we will see how to setup spark cluster with Hadoop YARN, Apache Mesos and How to submit spark application to spark cluster.