Spark is an in-memory open source cluster computing system allowing for fast iterative and interactive analytics. Spark utilizes Scala – a type-safe objected oriented language with functional properties that is fully interoperable with Java. For more information about Spark, please refer to http://spark-project.org. To test out Spark, you can install the stand-alone version on Mac OSX.
This is a follow up to my previous blog post on the topic – Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8). Since this blog post, Spark has added some interesting features including:
– Spark Streaming as part of Spark 0.7
– An associated in-memory file system called Tachyon, more info at: AmpLab Tachyon and Shark Update.
– An associated graph analytics on top of Spark called GraphX, more info at: AmpLab GraphX: Graph Analytics on Spark.
Install Scala 2.9.3
The first thing you will need to do is to install Scala 2.9.3 as Spark 0.7.2 is dependent on it. As of this posting, the current version of Scala is 2.10 but there are some issues (at the time of this blog post) with Spark 0.7.2 and Scala 2.10.
1) A handy way to installing Scala is to use Home Brew; please reference Installing Hadoop on OSX Lion (10.7) for more information on how to use Home Brew as well installing Hadoop on Mac OSX. It may be handy to install Hadoop so that way you can use Spark against HDFS as well.
2) The current Home Brew scala formula installs Scala 2.10 but you will need to use Scala 2.9.3. A quick way to do to this is to modify the scala.rb formula (/usr/local/Library/Formula/scala.rb) to install Scala 2.9.3.
require 'formula' class ScalaDocs < Formula homepage 'http://www.scala-lang.org/' url 'http://www.scala-lang.org/downloads/distrib/files/scala-docs-2.9.3.zip' sha1 '5bf44bd04b2b37976bde5d4a4c9bb6bcdeb10eb2' end class ScalaCompletion < Formula homepage 'http://www.scala-lang.org/' url 'https://raw.github.com/scala/scala-dist/27bc0c25145a83691e3678c7dda602e765e13413/completion.d/2.9.1/scala' version '2.9.1' sha1 'e2fd99fe31a9fb687a2deaf049265c605692c997' end class Scala < Formula homepage 'http://www.scala-lang.org/' url 'http://www.scala-lang.org/downloads/distrib/files/scala-2.9.3.tgz' #sha1 '87f605a186aa0e4435b302fb9af575513d29249a' sha1 '01bf9e2c854e2385b2bcef319840415867a00388' option 'with-docs', 'Also install library documentation' def install rm_f Dir["bin/*.bat"] doc.install Dir['doc/*'] man1.install Dir['man/man1/*'] libexec.install Dir['*'] bin.install_symlink Dir["#{libexec}/bin/*"] ScalaCompletion.new.brew { (prefix/'etc/bash_completion.d').install 'scala' } ScalaDocs.new.brew { doc.install Dir['*'] } if build.include? 'with-docs' end end
3) Installing Scala via HomeBrew by typing the command in a bash terminal:
brew install scala
Upon running this command, scala will be located in /usr/local/Cellar/scala
Ensure you have set the JAVA_HOME and SCALA_HOME variables
In my case, I have configured my .profile with the following:
# Java export JAVA_HOME=/Library/Java/JavaVirtualMachines/1.6.0_32-b05-417.jdk/Contents/Home # Scala export SCALA_HOME=/usr/local/Cellar/scala/2.9.3/libexec
Installing Spark 0.7.2
1) Obtain the pre-built Spark 0.7.2 package at http://spark-project.org/downloads/. The direct link for the prebuilt package is Spark 0.7.2.
2) Open up the tgz file and place it into a folder where you will install Spark. For example, I placed mine in the HomeBrew Cellar location, i.e.
/usr/local/Cellar/spark-0.7.2
Configure and Build Spark 0.7.2
Follow the instructions as per the README.MD in /usr/local/Cellar/spark-0.7.2 or at http://spark-project.org/docs/latest/.
1) Run the Simple Build Tool (SBT) package from /usr/local/Cellar/spark-0.7.2
sbt/sbt package
2) Modify the conf/spark-env.sh
Ensure that SCALA_HOME variable has been set
export SCALA_HOME=/usr/local/Cellar/scala/2.9.3/libexec
Running Spark 0.7.2
From here, you can now run Spark examples as noted in the Spark Quick Start. Note, if you are running the standalone job samples (e.g. A Standalone Job in Scala), make sure you have installed sbt first (via Home Brew, the command is ‘brew install sbt’).
Enjoy!
I had to update the link for Scala 2.9.3 to “http://www.scala-lang.org/files/archive/scala-2.9.3.tgz” in the Homebrew Formula. Thanks for the post!
Instead of modifying scala.rb formula, just use brew to install scala29.
It is a popular release, so it has been maintained.
To learn how to do this in brew see
http://stackoverflow.com/questions/3987683/homebrew-install-specific-version-of-formula
At the moment the following should give you what you want.
brew search scala
brew install homebrew/versions/scala29
If you already have another scala version installed and symlinked to,
you may also need to do afterwards:
brew link –overwrite scala29
Thanks for the call out Brandon – yes, it’s a much better idea to use the different brew versions, eh?!