Getting Hadoop and protobufs up and running with Elephant Bird on Mac OSX Mountain Lion

images

.

“No, not Angry Bird – Elephant Bird!”

— said no one

.
.

In a few of my customer projects, we started diving into using protocol buffers (protobufs) as our sequence file to be stored within our Hadoop infrastructure.  While these were HDInsight on Azure projects, most of the native Hadoop code is written originally in Linux and has implied assumptions as directory paths, etc.  Therefore, one of the first things I usually do is try to install said software on my handy MacBook Air so that way if and when I run into issues getting the code to compile on Windows, I already have a nice path laid out that I can go back to and debug if and when I run into issues.

Note that we’re still using the Hadoop 1.0 branch so protobufs are not native yet (they will be as part of Hadoop 2.0 / YARN).  Fortunately, there is a handy project called Elephant Bird – it is Twitter’s collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.  The instructions are excellent and the main reason I’m providing a separate set of instructions is because I happen to be using HomeBrew to do most of the installations (so handy) and there are some compatibility concerns to ensure that everything is up and running correctly.

By the way, another good project to consider for protobuf is hive-protobuf.

Current Configuration

My MacBook Air is running OSX Mountain Lion (10.8) and it has Hadoop 1.0.1, Hive 0.8.1, Pig 1.0, and Hive 0.9.0.  The first three are from my initial Hadoop installation – Installing Hadoop on OSX Lion (10.7) – while the latter was from my more recent Spark installation – Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8)

Pre-requisites

This post presumes you are already familiar with HomeBrew and some of the tricks for modifying the formulas to get the exact version you want in Installing Hadoop on OSX Lion (10.7)

Install protobuf 2.3

As noted in the Elephant-Bird documentation, please ensure that you are pointing to the 2.3 and not 2.4 library as noted in the modified Brew Formula for protobuf.rb:

class Protobuf < Formula

  homepage ‘http://code.google.com/p/protobuf/

  #url ‘http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.bz2

  url ‘http://protobuf.googlecode.com/files/protobuf-2.3.0.tar.bz2

  #sha1 ‘df5867e37a4b51fb69f53a8baf5b994938691d6d’

  sha1 ‘db0fbdc58be22a676335a37787178a4dfddf93c6’

and then install protobufs

brew install protobuf

Install Thrift

The installation of Thrift is a bit more complicated only because we need to use Thrift 0.7.  As of this post, the current version of Thrift is 0.9 but due to my current installation of Hive 0.9.0 and elephant bird compatibility, I need to use Thrift 0.7.  The full instructions for how to install Thrift is actually quite handy and can be found at: http://thrift.apache.org/docs/install/os_x/

The first two are relatively straight forward and you can use HomeBrew to do the installation.

brew install boost

brew install libevent

The next steps involved actually installing Thrift

1) But as Thrift is currently on 0.9, the first thing we had to do is download and unpack the Thrift 0.7 tarball from http://archive.apache.org/dist/thrift/0.7.0/ and unpack it.

2) A quick HomeBrew way of doing the above step is to modify the thrift.rb formula to aim at 0.7 and unpack it from HomeBrew.  To do this, first modify the thrift.rb formula:

class Thrift < Formula

  homepage ‘http://thrift.apache.org

  #url ‘http://www.apache.org/dyn/closer.cgi?path=thrift/0.9.0/thrift-0.9.0.tar.gz

  url ‘http://archive.apache.org/dist/thrift/0.7.0/thrift-0.7.0.tar.gz

  #sha1 ‘fefcf4d729bf80da419407dfa028740aa95fa2e3’

  sha1 ‘b8f6877bc75878984355da4efe171ad99ff05b6a’

and then run the command below from a working folder (as opposed to the /usr/local/Cellar folder where HomeBrew normally installs the code)

brew unpack thrift

3) When I executed the commands below, I ran into permissions issues.  To avoid them, execute the command

chmod 775 ~/Downloads/temp/thrift-0.7.0

where ~/Downloads/temp/thrift-0.7.0 is where I had unpacked the 0.7 tarball.

4) I guess I’m a huge fan of HomeBrew because all I did was recreate the commands in the thrift.rb formula and ran them:

  ./configure –prefix=/usr/local/Cellar/thrift/0.7.0 –libdir=/usr/local/Cellar/thrift/0.7.0/lib –without-python –without-ruby –without-haskell –without-java –without-perl –without-php –without-erlang

  make

  make install

5) Create symlink for Thrift

  ln -s ../Cellar/thrift/0.7.0/bin/thrift thrift

Now you can install Elephant-Bird

After all this, is done you’re ready and set to install Elephant Bird

1) Get the code: git clone git://github.com/kevinweil/elephant-bird.git

2) Build the jar: mvn package

3)  Explore what’s available: mvn javadoc:javadoc

That’s it!  Happy Coding!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s