Ramblings of a data dork: from BI and Big Data to Travel and Food
“No, not Angry Bird – Elephant Bird!”
– said no one
In a few of my customer projects, we started diving into using protocol buffers (protobufs) as our sequence file to be stored within our Hadoop infrastructure. While these were HDInsight on Azure projects, most of the native Hadoop code is written originally in Linux and has implied assumptions as directory paths, etc. Therefore, one of the first things I usually do is try to install said software on my handy MacBook Air so that way if and when I run into issues getting the code to compile on Windows, I already have a nice path laid out that I can go back to and debug if and when I run into issues.
Note that we’re still using the Hadoop 1.0 branch so protobufs are not native yet (they will be as part of Hadoop 2.0 / YARN). Fortunately, there is a handy project called Elephant Bird - it is Twitter’s collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code. The instructions are excellent and the main reason I’m providing a separate set of instructions is because I happen to be using HomeBrew to do most of the installations (so handy) and there are some compatibility concerns to ensure that everything is up and running correctly.
By the way, another good project to consider for protobuf is hive-protobuf.
My MacBook Air is running OSX Mountain Lion (10.8) and it has Hadoop 1.0.1, Hive 0.8.1, Pig 1.0, and Hive 0.9.0. The first three are from my initial Hadoop installation – Installing Hadoop on OSX Lion (10.7) – while the latter was from my more recent Spark installation – Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8)
As noted in the Elephant-Bird documentation, please ensure that you are pointing to the 2.3 and not 2.4 library as noted in the modified Brew Formula for protobuf.rb:
class Protobuf < Formula
and then install protobufs
brew install protobuf
The installation of Thrift is a bit more complicated only because we need to use Thrift 0.7. As of this post, the current version of Thrift is 0.9 but due to my current installation of Hive 0.9.0 and elephant bird compatibility, I need to use Thrift 0.7. The full instructions for how to install Thrift is actually quite handy and can be found at: http://thrift.apache.org/docs/install/os_x/
The first two are relatively straight forward and you can use HomeBrew to do the installation.
brew install boost
brew install libevent
The next steps involved actually installing Thrift
1) But as Thrift is currently on 0.9, the first thing we had to do is download and unpack the Thrift 0.7 tarball from http://archive.apache.org/dist/thrift/0.7.0/ and unpack it.
2) A quick HomeBrew way of doing the above step is to modify the thrift.rb formula to aim at 0.7 and unpack it from HomeBrew. To do this, first modify the thrift.rb formula:
class Thrift < Formula
and then run the command below from a working folder (as opposed to the /usr/local/Cellar folder where HomeBrew normally installs the code)
brew unpack thrift
3) When I executed the commands below, I ran into permissions issues. To avoid them, execute the command
chmod 775 ~/Downloads/temp/thrift-0.7.0
where ~/Downloads/temp/thrift-0.7.0 is where I had unpacked the 0.7 tarball.
4) I guess I’m a huge fan of HomeBrew because all I did was recreate the commands in the thrift.rb formula and ran them:
./configure –prefix=/usr/local/Cellar/thrift/0.7.0 –libdir=/usr/local/Cellar/thrift/0.7.0/lib –without-python –without-ruby –without-haskell –without-java –without-perl –without-php –without-erlang
5) Create symlink for Thrift
ln -s ../Cellar/thrift/0.7.0/bin/thrift thrift
After all this, is done you’re ready and set to install Elephant Bird
1) Get the code: git clone git://github.com/kevinweil/elephant-bird.git
2) Build the jar: mvn package
3) Explore what’s available: mvn javadoc:javadoc
That’s it! Happy Coding!