This post was published May 8th, 2012 and the content may be obsolete. Thus, most of these links are no longer active but I am keeping this post for posterity.
For starters, this isn’t a production setup, this is just so that I can do some quick Hadoop demos on my Macbook Air (2011). In this case, my configuration is OSX Lion, 4GB RAM, and 256GB SSD. As well, a serious shout out to the authors below whom I had referenced to create this post.
References:
• Installing Hadoop on Mac OSX Lion by Ritesh Agrawal
• Flume and Hadoop on OSX by Arbo v. Monkiewitsch (@arbovm)
.
In fact, originally I was just going to provide links to their blog posts but ran into some hiccups along the way. To make it easier to read, I’m going to call out all the steps (hopefully!)
Install the OSX Lion Prerequisites
Ensure that you have installed – as of this post –
• XCode 4.3.2
• Java Developer for 10.7
.
You can find both updates at https://developer.apple.com/downloads/index.action
Make sure to update XCode (Preferences > Downloads) so that the Command Line Tools are also installed (requirement for HomeBrew)
Install Homebrew
If you are not familiar with Homebrew, you are definitely missing the package manager for OSX (it’s their tag line, but it’s also very true). Homebrew will install packages into their own folders and use symlinks back to the /usr/local folder. This will allow for much easier removal and isolation of UNIX packages – such as Hadoop. As well, Homebrew will not require you to use sudo to perform a brew install.
Install Homebrew by going to the Homebrew installation link and running the provided ruby script. In this case, please go to the link so that way if they change it, these instructions are more or less still up to date.
After installing Homebrew, from the Terminal.app, ensure you run the following:
• brew doctor – This validates that the installation is okay. Sometimes it will catch errors (e.g. XCode command line paths are not as expected)
• brew update – To ensure that the latest Formulas are available and installed in the /usr/local/Library/Formula folder. Formulas are Ruby scripts that define the installation of a package, you can find the list in the folder or online at Homebrew Library Formula; to make your own, check out the Formula Cookbook.
Installing Hadoop 1.0.1
If all is well, the only thing you have to type into the Terminal.app to install Hadoop is to run:
brew install hadoop
Error: Cannot allocate memory – connect(2)
Yet, one of the reasons I’m writing this post is because I ran into the following error (hopefully you don’t):
Error: Cannot allocate memory – connect (2)

Looking at the above URL, the http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz, it points to the page:

i.e. its in fact a redirection error not a memory allocation error.
To fix it immediately, you will need to update the Hadoop Formula locally which can be found in the /usr/local/Library/Formulas/hadoop.rb. Update the link so it points to the mirror site instead of this path.
#url 'http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz' url 'http://apache.mirrorcatalogs.com/hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz'

Once you update the hadoop.rb file, you can then go back and re-run
brew install hadoop
and the installation will proceed.

Configure Hadoop
Now you will need to update the Hadoop configuration files to get everything up and running. The Hadoop configuration files can all be found in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf folder. The files to be updated are:
• hadoop-env.sh: These are the Hadoop environment variables, a quick change is suggested to suppress an error message.
• core-site.xml
• hdfs-site.xml
• mapred-site.xml
And thanks to @arbovm for his details at: Flume and Hadoop on OS X.
.
hadoop-env.sh
Suppress the “Unable to load realm info from SCDynamicStore” error. As noted in https://issues.apache.org/jira/browse/HADOOP-7489, for a single box deployment, update the text to include the line below.
export HADOOP_OPTS=”-Djava.security.krb5.realm= -Djava.security.krb5.kdc=”
It should look something like this
# export HADOOP_OPTS=-server export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc="
.
core-site.xml
The core-site.xml file should look this
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
In this particular case, I have created the Hadoop temporary folder in the /usr/local/ folders as well (this way you do not need sudo to create the HDFS folders). To match the above configuration, please execute the following commands
mkdir /usr/local/Cellar/hadoop/hdfs/tmp
.
hdfs-site.xml
Below is a quick suggested configuration for the hdfs-site.xml file. In this case, the configuration is to set the Hadoop Distributed File System (HDFS) replication to 1 as this is a single box for demos and functional test.
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
.
mapred-site.xml
The below configuration is to set the Job tracker connection port.
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>
.
Enable SSH to localhost
Please note that Hadoop will connect to localhost using ssh; to configure it so that way you can connect from-and-to localhost without needing a password, you will need public keys to your authorized keys. To do this ensure that “Remote Login” is enabled (System Preferences > Sharing; ensure “Remote Login” is checked).
If you haven’t already done so, create your own public keys following the instructions at:
http://stackoverflow.com/questions/7134535/setup-passphraseless-ssh-to-localhost-on-os-x
ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Test it by running the command
ssh localhost
.
Running Hadoop
To get yourself off the ground, go ahead and run the following commands:
Format the Hadoop Namenode using:
hadoop namenode -format
Start Hadoop by running the script:
/usr/local/Cellar/hadoop/1.0.1/libexec/bin/start-all.sh
Run some quick tests
hadoop dfs -ls /
cd /usr/local/Cellar/hadoop/1.0.1/libexec
hadoop jar hadoop-examples-1.0.1.jar pi 10 100
Don’t forget you can access the Map Reduce Admin and HDFS Admin through the web browser
• http://localhost:50030 – Map Reduce Administrator
• http://localhost:50070 – HDFS Administrator
.
To Stop Hadoop
/usr/local/Cellar/hadoop/1.0.1/libexec/bin/stop-all.sh
Enjoy!
Leave a Reply