Installing Hadoop on OSX Lion (10.7)

May 8, 2012

This post was published May 8th, 2012 and the content may be obsolete. Thus, most of these links are no longer active but I am keeping this post for posterity.

For starters, this isn’t a production setup, this is just so that I can do some quick Hadoop demos on my Macbook Air (2011). In this case, my configuration is OSX Lion, 4GB RAM, and 256GB SSD. As well, a serious shout out to the authors below whom I had referenced to create this post.

References:
• Installing Hadoop on Mac OSX Lion by Ritesh Agrawal
• Flume and Hadoop on OSX by Arbo v. Monkiewitsch (@arbovm)

.
In fact, originally I was just going to provide links to their blog posts but ran into some hiccups along the way. To make it easier to read, I’m going to call out all the steps (hopefully!)

Install the OSX Lion Prerequisites

Ensure that you have installed – as of this post –
• XCode 4.3.2
• Java Developer for 10.7
.

You can find both updates at https://developer.apple.com/downloads/index.action

Make sure to update XCode (Preferences > Downloads) so that the Command Line Tools are also installed (requirement for HomeBrew)

Install Homebrew

If you are not familiar with Homebrew, you are definitely missing the package manager for OSX (it’s their tag line, but it’s also very true). Homebrew will install packages into their own folders and use symlinks back to the /usr/local folder. This will allow for much easier removal and isolation of UNIX packages – such as Hadoop. As well, Homebrew will not require you to use sudo to perform a brew install.

Install Homebrew by going to the Homebrew installation link and running the provided ruby script. In this case, please go to the link so that way if they change it, these instructions are more or less still up to date.

After installing Homebrew, from the Terminal.app, ensure you run the following:
• brew doctor – This validates that the installation is okay. Sometimes it will catch errors (e.g. XCode command line paths are not as expected)
• brew update – To ensure that the latest Formulas are available and installed in the /usr/local/Library/Formula folder. Formulas are Ruby scripts that define the installation of a package, you can find the list in the folder or online at Homebrew Library Formula; to make your own, check out the Formula Cookbook.

Installing Hadoop 1.0.1

If all is well, the only thing you have to type into the Terminal.app to install Hadoop is to run:

brew install hadoop

Error: Cannot allocate memory – connect(2)

Yet, one of the reasons I’m writing this post is because I ran into the following error (hopefully you don’t):

Error: Cannot allocate memory – connect (2)

Looking at the above URL, the http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz, it points to the page:

i.e. its in fact a redirection error not a memory allocation error.

To fix it immediately, you will need to update the Hadoop Formula locally which can be found in the /usr/local/Library/Formulas/hadoop.rb. Update the link so it points to the mirror site instead of this path.

#url 'http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz'
url 'http://apache.mirrorcatalogs.com/hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz'

Once you update the hadoop.rb file, you can then go back and re-run

brew install hadoop

and the installation will proceed.

Configure Hadoop

Now you will need to update the Hadoop configuration files to get everything up and running. The Hadoop configuration files can all be found in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf folder. The files to be updated are:

• hadoop-env.sh: These are the Hadoop environment variables, a quick change is suggested to suppress an error message.
• core-site.xml
• hdfs-site.xml
• mapred-site.xml

And thanks to @arbovm for his details at: Flume and Hadoop on OS X.

.
hadoop-env.sh

Suppress the “Unable to load realm info from SCDynamicStore” error. As noted in https://issues.apache.org/jira/browse/HADOOP-7489, for a single box deployment, update the text to include the line below.

export HADOOP_OPTS=”-Djava.security.krb5.realm= -Djava.security.krb5.kdc=”

It should look something like this

# export HADOOP_OPTS=-server
export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc="

.
core-site.xml

The core-site.xml file should look this

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
    <description>A base for other temporary directories.</description>
</property>
<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
</property>
</configuration>

In this particular case, I have created the Hadoop temporary folder in the /usr/local/ folders as well (this way you do not need sudo to create the HDFS folders). To match the above configuration, please execute the following commands

mkdir /usr/local/Cellar/hadoop/hdfs
mkdir /usr/local/Cellar/hadoop/hdfs/tmp

.
hdfs-site.xml

Below is a quick suggested configuration for the hdfs-site.xml file. In this case, the configuration is to set the Hadoop Distributed File System (HDFS) replication to 1 as this is a single box for demos and functional test.

.
mapred-site.xml

The below configuration is to set the Job tracker connection port.

Enable SSH to localhost

Please note that Hadoop will connect to localhost using ssh; to configure it so that way you can connect from-and-to localhost without needing a password, you will need public keys to your authorized keys. To do this ensure that “Remote Login” is enabled (System Preferences > Sharing; ensure “Remote Login” is checked).

If you haven’t already done so, create your own public keys following the instructions at:
http://stackoverflow.com/questions/7134535/setup-passphraseless-ssh-to-localhost-on-os-x

ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Test it by running the command

ssh localhost
.

Running Hadoop

To get yourself off the ground, go ahead and run the following commands:

Format the Hadoop Namenode using:
hadoop namenode -format

Start Hadoop by running the script:
/usr/local/Cellar/hadoop/1.0.1/libexec/bin/start-all.sh

Run some quick tests
hadoop dfs -ls /

cd /usr/local/Cellar/hadoop/1.0.1/libexec
hadoop jar hadoop-examples-1.0.1.jar pi 10 100

Don’t forget you can access the Map Reduce Admin and HDFS Admin through the web browser
• http://localhost:50030 – Map Reduce Administrator
• http://localhost:50070 – HDFS Administrator

.
To Stop Hadoop
/usr/local/Cellar/hadoop/1.0.1/libexec/bin/stop-all.sh

Enjoy!

dennyglee

Configuration, Hadoop

31 responses to “Installing Hadoop on OSX Lion (10.7)”

Roy Seto (@roy_seto)

July 5, 2012 at 2:15 am

Thank you Denny – very helpful. Note, I think you have a typo…

http://locahost:50030 – Map Reduce Administrator should be
http://localhost:50030 – Map Reduce Administrator (with the second ‘l’)

Loading…

Reply
Apache Hadoop: What is the best way to install hadoop on Mac Air 10.7? – Quora

July 29, 2012 at 10:39 am

[…] osx similar to yum / aptitude)brew install hadoopthis blog post might help you to explore further ..http://dennyglee.com/2012/05/08/…Comment Loading… • Post • Just now Add […]

Loading…

Reply
kamalh (@kamalh)

August 22, 2012 at 6:48 am

Denny this worked great !

On Mountain Lion with latest homebrew, there is no redirection error and so the first step in your directions to update the link to point to the mirror site doesn’t seem to be required.
Otherwise it all worked just fine.

Thanks a lot for the comprehensive directions.

Loading…

Reply
1. dennyglee
  
  August 23, 2012 at 3:28 am
  
  Thanks Kamal – glad you enjoyed it, eh?! 🙂
  
  Loading…
  
  Reply
Installing Hadoop on OS X Lion (10.7) & MBA « Big Data Analytics

August 27, 2012 at 1:43 am

[…] on dennyglee.com 이것이 좋아요:좋아하기Be the first to like […]

Loading…

Reply
peterlorent

October 14, 2012 at 11:45 am

Great stuff! Got Hadoop up and running in 20 minutes thanks to this post! Thanks for that.

Loading…

Reply
1. dennyglee
  
  October 17, 2012 at 1:28 pm
  
  Thanks! Glad to hear it was helpful!
  
  Loading…
  
  Reply
Mark Bullock

November 2, 2012 at 11:45 am

Really decent directions – thank you very much.

Loading…

Reply
1. dennyglee
  
  November 3, 2012 at 9:08 pm
  
  Glad it was helpful!
  
  Loading…
  
  Reply
Kiran Biliyawala

December 19, 2012 at 11:27 am

I m still getting
2012-12-19 16:48:34.917 java[17902:1b03] Unable to load realm info from SCDynamicStore
and also for port 9000 on my mac
Failed to read data from “hdfs://localhost:9000/user/kiranbilliyawala/excite.log.bz2”
Please guide me to solve these..

Loading…

Reply
dennyglee

December 29, 2012 at 1:48 am

Its a little tricky to debug this based on the above errors. When you get the “unable to load realm info from SCDynamicStore” error, are you getting this during startup, loading, reading?

Loading…

Reply
1. Kiran Biliyawala
  
  December 29, 2012 at 4:20 am
  
  Yes I’m getting this only during these three times..I.e. startup, loading and reading files. What should I do?
  
  Loading…
  
  Reply
anurgaojha

January 22, 2013 at 8:40 am

Copying and Pasting the XML config provided on your site gives errors when I run the command:
hadoop namenode -format

[Fatal Error] core-site.xml:1:15: The value following “version” in the XML declaration must be a quoted string.
13/01/22 00:13:36 FATAL conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: The value following “version” in the XML declaration must be a quoted string.
13/01/22 00:13:36 ERROR namenode.NameNode: java.lang.RuntimeException: org.xml.sax.SAXParseException: The value following “version” in the XML declaration must be a quoted string.

This was because special characters. Replacing the “s and removing the line:

fixed the issue.

Loading…

Reply
Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8) « Denny Lee

February 4, 2013 at 4:01 pm

[…] A handy way to installing Scala is to use Home Brew; please reference Installing Hadoop on OSX Lion (10.7) for more information on how to use Home Brew as well installing Hadoop on Mac OSX. It may be […]

Loading…

Reply
Phillip Burger

February 10, 2013 at 10:30 pm

This is a very helpful post, Denny. I kept coming back to it in Autumn 2012 and again this weekend.

I installed Hadoop 1.0.4 on OS X 10.6.8 in autumn 2012. Above instructions were very helpful. If using Homebrew, as Denny says, you’ll want to make sure Brew itself, Xcode, and the command line tools are installed and current. Only after you’ve cleaned up with brew doctor, attempt to install Hadoop.

I migrated my apps and files from a 10.6.8 box to a new, 10.8 box this past weekend. Some notes to help out:

* I didn’t run into any problems that I felt were related to the version of OS X, the version of Hadoop, or the version of Java. 10.8, Hadoop 1.0.4, and java version 1.6.0_37 seem to work ok together.

* A big problem for me was that I found that I had multiple copies of Hadoop resource files on my box. Ugh. It’s from Autumn when I didn’t know what I was doing. I floundered for a while and then decided to focus on just one set of resource files. I focused on the install that I brought down from Apache in autumn.

* If you end up like me with a mess and don’t know what *.sh and *.xml you’re executing, create a variable. I created a HADOOP_HOME=/Applications/hadoop-1.0.4/ in my ~/.bash_profile. I used $HADOOP_HOME/bin/*.sh to completely take the guess work out of what shell I was running. This is not a good practice. I need to clean up my filesystem.

* I had trouble setting JAVA_HOME. I put $JAVA_HOME in my ~/.bash_profile. This is what’s working for me:

export JAVA_HOME=`/usr/libexec/java_home -v 1.6`

This is the Java environment I have on my box:
my_host_name$ java -version
java version “1.6.0_37”
Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434-11M3909)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)

* I was having trouble with one or the other of job tracker 50030 or DFS 50070 not coming up on HTTP. For the problem with job tracker, i found this error in my job tracker log file:

2013-02-10 11:55:50,535 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to localhost/127.0.0.1:8021 : Address already in use.

I restarted. This didn’t fix it. I changed mapred.job.tracker to use 9001, and fs.default.name to use 9000. This worked, but I admit I’m not sure if there is any causality.

* Another fix was to change System Preferences-> Sharing -> Remote Login from two users using the “Only these users” option to “Allow access for All users.”

* A helpful command if just getting Hadoop up and testing with start-all.sh is jps. Use it right after executing start-all.sh to see what processes are running, or not! There should be five Hadoop, named processes (don’t include jps). Save time…go straight to the log file associated with the process that is missing. The command ps ax | grep hadoop | wc -l should return 6.

Great post, Denny, thanks!

Loading…

Reply
1. dennyglee
  
  February 14, 2013 at 8:02 am
  
  Glad it worked and thanks for the additional info, eh?!
  
  Loading…
  
  Reply
Jon Milan

February 22, 2013 at 9:39 pm

Thanks for the post. Just installed 1.1.1 fresh and aside from the versions in the pathnames, the instructions are fine.

Port 50030: isn’t operating. Haven’t looked into this yet.

BTW, my install is a single-node on Lion Server 10.7.5; Xcode Version 4.6 (4H127).

Loading…

Reply
1. dennyglee
  
  February 27, 2013 at 3:50 am
  
  Got it. Don’t forget 1.1.1 has a dependency on the Snappy codec so you’ll have to compile the Hadoop-snappy code for it to work, eh?!
  
  Loading…
  
  Reply
Sawan Gupta

February 27, 2013 at 4:23 am

Thanks! Worked like a charm on 10.8.2

Loading…

Reply
1. dennyglee
  
  February 27, 2013 at 10:20 am
  
  Excellent – glad to hear it!
  
  Loading…
  
  Reply
  1. Sawan Gupta
    
    February 28, 2013 at 10:19 am
    
    However after running the Pi job, I get
    
    Job Finished in 27.106 seconds
    Estimated value of Pi is 3.14800000000000000000
    
    This looks bad eh ?
    
    Loading…
2. dennyglee
  
  February 27, 2013 at 6:51 pm
  
  Awesome – glad to hear it!
  
  Loading…
  
  Reply
Getting Hadoop and protobufs up and running with Elephant Bird on Mac OSX Mountain Lion | Denny Lee

March 6, 2013 at 4:01 pm

[…] 0.8.1, Pig 1.0, and Hive 0.9.0. The first three are from my initial Hadoop installation – Installing Hadoop on OSX Lion (10.7) – while the latter was from my more recent Spark installation – Installing Spark 0.6.1 […]

Loading…

Reply
JerryK

May 31, 2013 at 11:51 pm

My installation is failing at the “hadoop namenode -format”. The error is as follows:

hadoop namenode -format
/usr/local/Cellar/hadoop/1.1.2/libexec/bin/../conf/hadoop-env.sh: line 19: export: `-Djava.security.krb5.kdc=”’: not a valid identifier
Error: Could not find or load main class ”-Djava.security.krb5.realm=

Loading…

Reply
1. dennyglee
  
  July 11, 2013 at 4:46 am
  
  Sorry for the delay! The reason you’re getting this is due to the single quotes being translated incorrectly on the web page. If you could change them to proper single quotes / double quotes you should be good to go. HTH!
  
  Loading…
  
  Reply
MacTommy

June 18, 2013 at 7:19 am

Denny, many thanks! You posted this a while back, but it’s still very helpful indeed.
I just installed Hadoop on Mac OS X 10.8.4 and it went like a charm.

Some very small remarks who are doing the same thing:
– On the Apple Developer website, the latest version of Java Developer is for Mac OS X 10.7.
So, I searched for like another 15 minutes, but I couldn’t find anything newer. Also, if you try to install it, it will say that the certificate is outdated. I installed it all the same, and Hadoop seems to run fine, so no problems.

– Denny, the quotes on your web page are translated into UTF-8 encoded double quotes. So that looks nice, but if I copy-paste this into the Hadoop XML files, or on the command line, that causes errors.

– Oh yes, you still have that ‘locahost’ (without the ‘l’) typo

Thanks again!

Cheers,

Tom

Loading…

Reply
1. dennyglee
  
  June 21, 2013 at 5:46 am
  
  Thanks for the great comments – I’ll update the page accordingly. Thanks!
  
  Loading…
  
  Reply
Jump Start onto Spark 0.7.2 and Scala 2.9.3 on Mac OSX | Denny Lee

July 16, 2013 at 12:02 am

[…] A handy way to installing Scala is to use Home Brew; please reference Installing Hadoop on OSX Lion (10.7) for more information on how to use Home Brew as well installing Hadoop on Mac OSX. It may be […]

Loading…

Reply
Rajeev Singla

August 18, 2013 at 9:31 pm

Thanks for the article. Finally i was able to deploy the hadoop on mac. I have tried like 5 times before everytime it use to fail to start.

brew install hadoop was the easiest way to setup. Thanks for the article.

Loading…

Reply
Saurav Nanda

September 16, 2014 at 11:01 pm

Hello,

I just installed on my Mac Maverick and it went all smooth like a cream.

Only issue is that the http://localhost:50030/ & http://localhost:50060/ is not working. Any help??

Loading…

Reply
1. dennyglee
  
  September 18, 2014 at 3:47 am
  
  If you had done the installation recently using HomeBrew, perhaps you had installed Hadoop 2.5.0? If so, then you potentially have the YARN version of Hadoop meaning the Apps view is at http://localhost:8088 and HDFS is http://localhost:50070. HTH!
  
  Loading…
  
  Reply