Ramblings of a data dork: from BI and Big Data to Travel and Food
For starters, this isn’t a production setup, this is just so that I can do some quick Hadoop demos on my Macbook Air (2011). In this case, my configuration is OSX Lion, 4GB RAM, and 256GB SSD. As well, a serious shout out to the authors below whom I had referenced to create this post.
In fact, originally I was just going to provide links to their blog posts but ran into some hiccups along the way. To make it easier to read, I’m going to call out all the steps (hopefully!)
Ensure that you have installed – as of this post -
You can find both updates at https://developer.apple.com/downloads/index.action
Make sure to update XCode (Preferences > Downloads) so that the Command Line Tools are also installed (requirement for HomeBrew)
If you are not familiar with Homebrew, you are definitely missing the package manager for OSX (it’s their tag line, but it’s also very true). Homebrew will install packages into their own folders and use symlinks back to the /usr/local folder. This will allow for much easier removal and isolation of UNIX packages – such as Hadoop. As well, Homebrew will not require you to use sudo to perform a brew install.
Install Homebrew by going to the Homebrew installation link and running the provided ruby script. In this case, please go to the link so that way if they change it, these instructions are more or less still up to date.
After installing Homebrew, from the Terminal.app, ensure you run the following:
If all is well, the only thing you have to type into the Terminal.app to install Hadoop is to run:
brew install hadoop
Error: Cannot allocate memory – connect(2)
Yet, one of the reasons I’m writing this post is because I ran into the following error (hopefully you don’t):
Error: Cannot allocate memory – connect (2)
Looking at the above URL, the http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-1.0.1/hadoop-1.0.1.tar.gz, it points to the page:
i.e. its in fact a redirection error not a memory allocation error.
To fix it immediately, you will need to update the Hadoop Formula locally which can be found in the /usr/local/Library/Formulas/hadoop.rb. Update the link so it points to the mirror site instead of this path.
Once you update the hadoop.rb file, you can then go back and re-run
brew install hadoop
and the installation will proceed.
Now you will need to update the Hadoop configuration files to get everything up and running. The Hadoop configuration files can all be found in the /usr/local/Cellar/hadoop/1.0.1/libexec/conf folder. The files to be updated are:
And thanks to @arbovm for his details at: Flume and Hadoop on OS X.
Suppress the “Unable to load realm info from SCDynamicStore” error. As noted in https://issues.apache.org/jira/browse/HADOOP-7489, for a single box deployment, update the text to include the line below.
export HADOOP_OPTS=”-Djava.security.krb5.realm= -Djava.security.krb5.kdc=”
It should look something like this
The core-site.xml file should look this
In this particular case, I have created the Hadoop temporary folder in the /usr/local/ folders as well (this way you do not need sudo to create the HDFS folders). To match the above configuration, please execute the following commands
Below is a quick suggested configuration for the hdfs-site.xml file. In this case, the configuration is to set the Hadoop Distributed File System (HDFS) replication to 1 as this is a single box for demos and functional test.
The below configuration is to set the Job tracker connection port.
Please note that Hadoop will connect to localhost using ssh; to configure it so that way you can connect from-and-to localhost without needing a password, you will need public keys to your authorized keys. To do this ensure that “Remote Login” is enabled (System Preferences > Sharing; ensure “Remote Login” is checked).
If you haven’t already done so, create your own public keys following the instructions at:
ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Test it by running the command
To get yourself off the ground, go ahead and run the following commands:
Format the Hadoop Namenode using:
hadoop namenode -format
Start Hadoop by running the script:
Run some quick tests
hadoop dfs -ls /
hadoop jar hadoop-examples-1.0.1.jar pi 10 100
Don’t forget you can access the Map Reduce Admin and HDFS Admin through the web browser
To Stop Hadoop