Installing Hadoop on OSX Lion (10.7)

For starters, this isn’t a production setup, this is just so that I can do some quick Hadoop demos on my Macbook Air (2011).  In this case, my configuration is OSX Lion, 4GB RAM, and 256GB SSD.   As well, a serious shout out to the authors below whom I had referenced to create this post.

References:
Installing Hadoop on Mac OSX Lion by Ritesh Agrawal
Flume and Hadoop on OSX by Arbo v. Monkiewitsch (@arbovm)

.
In fact, originally I was just going to provide links to their blog posts but ran into some hiccups along the way.  To make it easier to read, I’m going to call out all the steps (hopefully!)

Install the OSX Lion Prerequisites

Ensure that you have installed – as of this post –
• XCode 4.3.2
• Java Developer for 10.7
.

You can find both updates at https://developer.apple.com/downloads/index.action

Make sure to update XCode (Preferences > Downloads) so that the Command Line Tools are also installed (requirement for HomeBrew)

Install Homebrew

If you are not familiar with Homebrew, you are definitely missing the package manager for OSX (it’s their tag line, but it’s also very true).  Homebrew will install packages into their own folders and use symlinks back to the /usr/local folder.  This will allow for much easier removal and isolation of UNIX packages – such as Hadoop.  As well, Homebrew will not require you to use sudo to perform a brew install.

Install Homebrew by going to the Homebrew installation link and running the provided ruby script.  In this case, please go to the link so that way if they change it, these instructions are more or less still up to date.

After installing Homebrew, from the Terminal.app, ensure you run the following:
brew doctor – This validates that the installation is okay.  Sometimes it will catch errors (e.g. XCode command line paths are not as expected)
brew update – To ensure that the latest Formulas are available and installed in the /usr/local/Library/Formula folder.  Formulas are Ruby scripts that define the installation of a package, you can find the list in the folder or online at Homebrew Library Formula; to make your own, check out the Formula Cookbook.

Installing Hadoop 1.0.1

If all is well, the only thing you have to type into the Terminal.app to install Hadoop is to run:

brew install hadoop

Error: Cannot allocate memory – connect(2)

Yet, one of the reasons I’m writing this post is because I ran into the following error (hopefully you don’t):

Error: Cannot allocate memory – connect (2)

01 - cannot allocate memory

Looking at the above URL, the http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz, it points to the page:

02 - view url

i.e. its in fact a redirection error not a memory allocation error.

To fix it immediately, you will need to update the Hadoop Formula locally which can be found in the /usr/local/Library/Formulas/hadoop.rb.  Update the link so it points to the mirror site instead of this path.

#url 'http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz'
url 'http://apache.mirrorcatalogs.com/hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz'

03 - updated hadooprb

Once you update the hadoop.rb file, you can then go back and re-run

brew install hadoop

and the installation will proceed.

04 - fixed download link

Configure Hadoop

Now you will need to update the Hadoop configuration files to get everything up and running.  The Hadoop configuration files can all be found in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf folder.  The files to be updated are:

hadoop-env.sh: These are the Hadoop environment variables, a quick change is suggested to suppress an error message.
• core-site.xml
• hdfs-site.xml
• mapred-site.xml

And thanks to @arbovm for his details at: Flume and Hadoop on OS X.

.
hadoop-env.sh

Suppress the “Unable to load realm info from SCDynamicStore” error.  As noted in https://issues.apache.org/jira/browse/HADOOP-7489, for a single box deployment, update the text to include the line below.

export HADOOP_OPTS=”-Djava.security.krb5.realm= -Djava.security.krb5.kdc=”

It should look something like this

# export HADOOP_OPTS=-server
export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc="

.
core-site.xml

The core-site.xml file should look this

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
    <description>A base for other temporary directories.</description>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

In this particular case, I have created the Hadoop temporary folder in the /usr/local/ folders as well (this way you do not need sudo to create the HDFS folders).  To match the above configuration, please execute the following commands

mkdir /usr/local/Cellar/hadoop/hdfs
mkdir /usr/local/Cellar/hadoop/hdfs/tmp

.
hdfs-site.xml

Below is a quick suggested configuration for the hdfs-site.xml file.  In this case, the configuration is to set the Hadoop Distributed File System (HDFS) replication to 1 as this is a single box for demos and functional test.

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

.
mapred-site.xml

The below configuration is to set the Job tracker connection port.

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>

.

Enable SSH to localhost

Please note that Hadoop will connect to localhost using ssh; to configure it so that way you can connect from-and-to localhost without needing a password, you will need public keys to your authorized keys.  To do this ensure that “Remote Login” is enabled (System Preferences > Sharing; ensure “Remote Login” is checked).

If you haven’t already done so, create your own public keys following the instructions at:
http://stackoverflow.com/questions/7134535/setup-passphraseless-ssh-to-localhost-on-os-x

ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Test it by running the command

ssh localhost
.

Running Hadoop

To get yourself off the ground, go ahead and run the following commands:

Format the Hadoop Namenode using:
hadoop namenode -format

Start Hadoop by running the script:
/usr/local/Cellar/hadoop/1.0.1/libexec/bin/start-all.sh

Run some quick tests
hadoop dfs -ls /

cd /usr/local/Cellar/hadoop/1.0.1/libexec
hadoop jar hadoop-examples-1.0.1.jar pi 10 100

Don’t forget you can access the Map Reduce Admin and HDFS Admin through the web browser
http://localhost:50030 – Map Reduce Administrator
http://localhost:50070 – HDFS Administrator

.
To Stop Hadoop
/usr/local/Cellar/hadoop/1.0.1/libexec/bin/stop-all.sh

Enjoy!

31 Comments

  1. Thank you Denny – very helpful. Note, I think you have a typo…

    http://locahost:50030 – Map Reduce Administrator should be
    http://localhost:50030 – Map Reduce Administrator (with the second ‘l’)

  2. […] osx similar to yum / aptitude)brew install hadoopthis blog post might help you to explore further ..https://dennyglee.com/2012/05/08/…Comment Loading… • Post • Just now  Add […]

  3. Denny this worked great !

    On Mountain Lion with latest homebrew, there is no redirection error and so the first step in your directions to update the link to point to the mirror site doesn’t seem to be required.
    Otherwise it all worked just fine.

    Thanks a lot for the comprehensive directions.

    1. Thanks Kamal – glad you enjoyed it, eh?! 🙂

  4. […] on dennyglee.com 이것이 좋아요:좋아하기Be the first to like […]

  5. peterlorent

    Great stuff! Got Hadoop up and running in 20 minutes thanks to this post! Thanks for that.

    1. Thanks! Glad to hear it was helpful!

  6. Mark Bullock

    Really decent directions – thank you very much.

    1. Glad it was helpful!

  7. I m still getting
    2012-12-19 16:48:34.917 java[17902:1b03] Unable to load realm info from SCDynamicStore
    and also for port 9000 on my mac
    Failed to read data from “hdfs://localhost:9000/user/kiranbilliyawala/excite.log.bz2”
    Please guide me to solve these..

  8. Its a little tricky to debug this based on the above errors. When you get the “unable to load realm info from SCDynamicStore” error, are you getting this during startup, loading, reading?

    1. Yes I’m getting this only during these three times..I.e. startup, loading and reading files. What should I do?

  9. anurgaojha

    Copying and Pasting the XML config provided on your site gives errors when I run the command:
    hadoop namenode -format

    [Fatal Error] core-site.xml:1:15: The value following “version” in the XML declaration must be a quoted string.
    13/01/22 00:13:36 FATAL conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: The value following “version” in the XML declaration must be a quoted string.
    13/01/22 00:13:36 ERROR namenode.NameNode: java.lang.RuntimeException: org.xml.sax.SAXParseException: The value following “version” in the XML declaration must be a quoted string.

    This was because special characters. Replacing the “s and removing the line:

    fixed the issue.

  10. […] A handy way to installing Scala is to use Home Brew; please reference Installing Hadoop on OSX Lion (10.7) for more information on how to use Home Brew as well installing Hadoop on Mac OSX.  It may be […]

  11. This is a very helpful post, Denny. I kept coming back to it in Autumn 2012 and again this weekend.

    I installed Hadoop 1.0.4 on OS X 10.6.8 in autumn 2012. Above instructions were very helpful. If using Homebrew, as Denny says, you’ll want to make sure Brew itself, Xcode, and the command line tools are installed and current. Only after you’ve cleaned up with brew doctor, attempt to install Hadoop.

    I migrated my apps and files from a 10.6.8 box to a new, 10.8 box this past weekend. Some notes to help out:

    * I didn’t run into any problems that I felt were related to the version of OS X, the version of Hadoop, or the version of Java. 10.8, Hadoop 1.0.4, and java version 1.6.0_37 seem to work ok together.

    * A big problem for me was that I found that I had multiple copies of Hadoop resource files on my box. Ugh. It’s from Autumn when I didn’t know what I was doing. I floundered for a while and then decided to focus on just one set of resource files. I focused on the install that I brought down from Apache in autumn.

    * If you end up like me with a mess and don’t know what *.sh and *.xml you’re executing, create a variable. I created a HADOOP_HOME=/Applications/hadoop-1.0.4/ in my ~/.bash_profile. I used $HADOOP_HOME/bin/*.sh to completely take the guess work out of what shell I was running. This is not a good practice. I need to clean up my filesystem.

    * I had trouble setting JAVA_HOME. I put $JAVA_HOME in my ~/.bash_profile. This is what’s working for me:

    export JAVA_HOME=`/usr/libexec/java_home -v 1.6`

    This is the Java environment I have on my box:
    my_host_name$ java -version
    java version “1.6.0_37”
    Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434-11M3909)
    Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)

    * I was having trouble with one or the other of job tracker 50030 or DFS 50070 not coming up on HTTP. For the problem with job tracker, i found this error in my job tracker log file:

    2013-02-10 11:55:50,535 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to localhost/127.0.0.1:8021 : Address already in use.

    I restarted. This didn’t fix it. I changed mapred.job.tracker to use 9001, and fs.default.name to use 9000. This worked, but I admit I’m not sure if there is any causality.

    * Another fix was to change System Preferences-> Sharing -> Remote Login from two users using the “Only these users” option to “Allow access for All users.”

    * A helpful command if just getting Hadoop up and testing with start-all.sh is jps. Use it right after executing start-all.sh to see what processes are running, or not! There should be five Hadoop, named processes (don’t include jps). Save time…go straight to the log file associated with the process that is missing. The command ps ax | grep hadoop | wc -l should return 6.

    Great post, Denny, thanks!

    1. Glad it worked and thanks for the additional info, eh?!

  12. Jon Milan

    Thanks for the post. Just installed 1.1.1 fresh and aside from the versions in the pathnames, the instructions are fine.

    Port 50030: isn’t operating. Haven’t looked into this yet.

    BTW, my install is a single-node on Lion Server 10.7.5; Xcode Version 4.6 (4H127).

    1. Got it. Don’t forget 1.1.1 has a dependency on the Snappy codec so you’ll have to compile the Hadoop-snappy code for it to work, eh?!

  13. Sawan Gupta

    Thanks! Worked like a charm on 10.8.2

    1. Excellent – glad to hear it!

      1. Sawan Gupta

        However after running the Pi job, I get

        Job Finished in 27.106 seconds
        Estimated value of Pi is 3.14800000000000000000

        This looks bad eh ?

    2. Awesome – glad to hear it!

  14. […] 0.8.1, Pig 1.0, and Hive 0.9.0.  The first three are from my initial Hadoop installation – Installing Hadoop on OSX Lion (10.7) – while the latter was from my more recent Spark installation – Installing Spark 0.6.1 […]

  15. JerryK

    My installation is failing at the “hadoop namenode -format”. The error is as follows:

    hadoop namenode -format
    /usr/local/Cellar/hadoop/1.1.2/libexec/bin/../conf/hadoop-env.sh: line 19: export: `-Djava.security.krb5.kdc=”’: not a valid identifier
    Error: Could not find or load main class ”-Djava.security.krb5.realm=

    1. Sorry for the delay! The reason you’re getting this is due to the single quotes being translated incorrectly on the web page. If you could change them to proper single quotes / double quotes you should be good to go. HTH!

  16. MacTommy

    Denny, many thanks! You posted this a while back, but it’s still very helpful indeed.
    I just installed Hadoop on Mac OS X 10.8.4 and it went like a charm.

    Some very small remarks who are doing the same thing:
    – On the Apple Developer website, the latest version of Java Developer is for Mac OS X 10.7.
    So, I searched for like another 15 minutes, but I couldn’t find anything newer. Also, if you try to install it, it will say that the certificate is outdated. I installed it all the same, and Hadoop seems to run fine, so no problems.

    – Denny, the quotes on your web page are translated into UTF-8 encoded double quotes. So that looks nice, but if I copy-paste this into the Hadoop XML files, or on the command line, that causes errors.

    – Oh yes, you still have that ‘locahost’ (without the ‘l’) typo

    Thanks again!

    Cheers,

    Tom

    1. Thanks for the great comments – I’ll update the page accordingly. Thanks!

  17. […] A handy way to installing Scala is to use Home Brew; please reference Installing Hadoop on OSX Lion (10.7) for more information on how to use Home Brew as well installing Hadoop on Mac OSX.  It may be […]

  18. Rajeev Singla

    Thanks for the article. Finally i was able to deploy the hadoop on mac. I have tried like 5 times before everytime it use to fail to start.

    brew install hadoop was the easiest way to setup. Thanks for the article.

  19. Saurav Nanda

    Hello,

    I just installed on my Mac Maverick and it went all smooth like a cream.

    Only issue is that the http://localhost:50030/ & http://localhost:50060/ is not working. Any help??

    1. If you had done the installation recently using HomeBrew, perhaps you had installed Hadoop 2.5.0? If so, then you potentially have the YARN version of Hadoop meaning the Apps view is at http://localhost:8088 and HDFS is http://localhost:50070. HTH!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s