Denny Lee

Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8)

Apache Spark™ is an in-memory open source cluster computing system allowing for fast iterative and interactive analytics.  Spark utilizes Scala – a type-safe objected oriented language with functional properties that is fully interoperable with Java.  For more information, please refer to http://spark-project.org.  To test it out, you can install the stand-alone version on Mac OSX.

This post was published February 4th, 2013 and the content may be obsolete. Thus, most of these links are no longer active but I am keeping this post for posterity.

Install Scala 2.9.2

The first thing you will need to do is to install Scala 2.9.2 as Spark 0.6.1 is dependent on it.  As of this posting, the current version of Scala is 2.10 but there are some issues with Spark 0.6.1 and Scala 2.10 as noted in this thread.

1) A handy way to installing Scala is to use Home Brew; please reference Installing Hadoop on OSX Lion (10.7) for more information on how to use Home Brew as well installing Hadoop on Mac OSX.  It may be handy to install Hadoop so that way you can use Spark against HDFS as well.

2) The current Home Brew scala formula installs Scala 2.10 but you will need to use Scala 2.9.2.  A quick way to do to this is to modify the scala.rb formula (/usr/local/Library/Formula/scala.rb) to install Scala 2.9.2.

require 'formula'

class ScalaDocs < Formula
  homepage 'http://www.scala-lang.org/'
  url 'http://www.scala-lang.org/downloads/distrib/files/scala-docs-2.9.2.zip'
  sha1 '5bf44bd04b2b37976bde5d4a4c9bb6bcdeb10eb2'
end

class ScalaCompletion < Formula
  homepage 'http://www.scala-lang.org/'
  url 'https://raw.github.com/scala/scala-dist/27bc0c25145a83691e3678c7dda602e765e13413/completion.d/2.9.1/scala'
  version '2.9.1'
  sha1 'e2fd99fe31a9fb687a2deaf049265c605692c997'
end

class Scala < Formula
  homepage 'http://www.scala-lang.org/'
  url 'http://www.scala-lang.org/downloads/distrib/files/scala-2.9.2.tgz'
  #sha1 '87f605a186aa0e4435b302fb9af575513d29249a'
  sha1 '806fc1d91bda82d6a584172d7742531386ae68fb'

  option 'with-docs', 'Also install library documentation'

  def install
    rm_f Dir["bin/*.bat"]
    doc.install Dir['doc/*']
    man1.install Dir['man/man1/*']
    libexec.install Dir['*']
    bin.install_symlink Dir["#{libexec}/bin/*"]
    ScalaCompletion.new.brew { (prefix/'etc/bash_completion.d').install 'scala' }
    ScalaDocs.new.brew { doc.install Dir['*'] } if build.include? 'with-docs'
  end
end

3) Installing Scala via HomeBrew by typing the command in a bash terminal:

brew install scala

Upon running this command, scala will be located in /usr/local/Cellar/scala

Install Git Command Line for Mac

Ensure you have Git for Mac installed (even if you have GitHub for Mac installed; need to install Git so you can run from the command line)

http://git-scm.com/download/mac

Ensure you have set the JAVA_HOME and SCALA_HOME variables

In my case, I have configured my .profile with the following:

# Java
export JAVA_HOME=/Library/Java/JavaVirtualMachines/1.6.0_32-b05-417.jdk/Contents/Home

# Scala
export SCALA_HOME=/usr/local/Cellar/scala/2.9.2/libexec

Installing Spark 0.6.1

1) Obtain the pre-built Spark 0.6.1 package at http://spark-project.org/downloads/.  The direct link for the prebuilt package is: http://github.com/downloads/mesos/spark/spark-0.6.1-prebuilt.tgz


2) Open up the tgz file and place it into a folder where you will install Spark. For example, I placed mine in the HomeBrew Cellar location, i.e.

/usr/local/Cellar/spark-0.6.1

Configure and Build Spark 0.6.1

Follow the instructions as per the README.MD in /usr/local/Cellar/spark-0.6.1

1) Run the Simple Build Tool (SBT) package from /usr/local/Cellar/spark-0.6.1

sbt/sbt package

2) Modify the conf/spark-env.sh

Ensure that SCALA_HOME variable has been set

export SCALA_HOME=/usr/local/Cellar/scala/2.9.2/libexec

Running Spark 0.6.1

From here, you can now run Spark examples.  Just in case, run the conf/spark-env.sh to set the Scala enviornment variables.

conf/spark-env.sh
./run spark.examples.SparkLR local[2]
./run spark.examples.SparkPi local[4]

and to run the spark shell:

./spark-shell

where local indicates standalone (vs. EC2, cluster, mesos, etc.) and [x] is the number of cores.

Enjoy!

2 responses to “Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8)”

  1. […] My MacBook Air is running OSX Mountain Lion (10.8) and it has Hadoop 1.0.1, Hive 0.8.1, Pig 1.0, and Hive 0.9.0.  The first three are from my initial Hadoop installation – Installing Hadoop on OSX Lion (10.7) – while the latter was from my more recent Spark installation – Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8) […]

  2. […] is a follow up to my previous blog post on the topic – Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8).  Since this blog post, Spark has added some interesting features […]

Leave a Reply

Discover more from Denny Lee

Subscribe now to keep reading and get access to the full archive.

Continue reading