Ramblings of a data dork: from BI and Big Data to Travel and Food
I have been playing around with the compression codecs with Hadoop 1.01 over the last few months and wanted to provide quick tech tips on compression codecs and Hadoop. The key piece of advice is for you to get Tom White’s (@tom_e_white) Hadoop: The Definitive Guide. It is easily the must-have guide for Hadoop novices to experts.
The key fundamentals concerning compression codecs is that not all codecs are immediately available within Hadoop. Some of them are native to Hadoop (one needs to remember to compile the native libraries) while others need to be extracted for their source and compiled in.
Below is a handy table reference based on Tom’s book and some of the observations I have noticed from tests as well.
|Compression Format||Codec||Splittable||Compression Space ||Compression Time |
While each project has its own profile, some key best practices paraphrased and listed in order of effectiveness from Hadoop: The Definitive Guide are:
Some other handy compression tips are noted below.
Hope this helps!