In this article we will see how to setup Apache Spark on ubuntu machines building from source code.
1) Install Java
Spark processes runs in JVM, Java should be pre-installed on the machines on which we have to run Spark job.
Make sure the machine has Java8+ installed; if not, Java 8 can be installed easily using below commands:
$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
To check the installation use following command:
$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
2) Install Scala
Apache spark is written in Scala, to build it we would need Scala installed, if not installed already, run following commands:
$ echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
$ sudo apt-get update
$ sudo apt-get install sbt
3) Install Git
Apache spark requires Git, if not installed already, run following commands:
$ sudo apt-get install git
4) Download Spark
Apache spark can be downloaded from official Spark Website
bundles or in source code form, for this article we will install Spark from source code:
$ sudo wget wget http://d3kbcqa49mib13.cloudfront.net/spark-2.2.0.tgz
$ sudo tar xvf spark-2.2.0.tgz
5) Build Spark
Spark can be build using SBT(Simple Build Tool) which is bundled with it. Run following commands to build the code:
$ sudo build/sbt assembly
Using /usr/lib/jvm/java-8-oracle as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Attempting to fetch sbt
Launching sbt from build/sbt-launch-0.13.13.jar
Getting org.scala-sbt sbt 0.13.13 ...
Building spark will take some time, be patient !
6) Running Spark Shell
If everything goes well, you can start spark shell using following command:
$ sudo cd /opt/spark-2.2.0/
$ sudo .bin/spark-shell
-bash: .bin/spark-shell: No such file or directory
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/11/11 05:48:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/11 05:48:38 WARN Utils: Your hostname, techie-Satellite-Pro-R50-B resolves to a loopback address: 127.0.1.1; using 192.168.0.103 instead (on interface wlan0)
17/11/11 05:48:38 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context Web UI available at http://192.168.0.103:4040
Spark context available as 'sc' (master = local[*], app id = local-1510359518648).
Spark session available as 'spark'.
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
In this article we have seen how to setup Apache Spark on ubuntu machines building from source code.
In upcoming articles we will see more about Apache Spark cluster setup, job submission and more.