I’m currently playing with CDH 4.1 and was having fun with Hue – specifically Beeswax to execute Hive queries from a nice web UI. As noted in Hadoop compression codecs and optimizing Hive joins (and using compression to do it), using compression gives you more space and in many cases can improve query performance. Yet to my dismay, when I tried to execute a bunch of SET statements, I ended up getting the OK FAILED parse exception.
Of course this is what happens when you haven’t played a particular tech in awhile and don’t bother to do tutorials! On the left panel of Beeswax, there is a Settings panel which allows you to add whatever key-value pair settings you deem fit (with autofill of various but not all settings).
set mapred.compress.map.output=true; set mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; set hive.exec.compress.output=true;
In this case, I just filled the settings directly to the settings panel, and then proceeded to run my Hive queries to create my compressed table (don’t forget to create the table as a SEQUENCEFILE).
Hope this helps!