To compile a Hive UDF and if you have the Hadoop source code, the right way to do this is to use maven with the Hive repository so you can compile your JAR using the exact version of the source code / jars that you are working against. For more information on how to use maven, check out: http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html
But in situations where you do not have access to the source code, but you have all the necessary jars (like the jars within the lib folder) you can workaround this by manually compiling the Hive UDFs. To do this, let’s add a bunch of UDFs and UDAFs (https://issues.apache.org/jira/browse/HIVE-1545) – for this example, we’ll use the UDFNumberRows.java only (this is a handy function allowing you to perform a statistical rank). All of these commands are executed on the head node of my HoA cluster by RDPing into it and running them from the Hadoop Command Shell.
1. Download and open the udfs.tar.gz so you grab the source code for the UDFs.
https://issues.apache.org/jira/browse/HIVE-1545
Note when you extract out the udfs – you will notice that there is a folder structure com\facebook\hive\udf. Also if you open up the file [download folder]\com\facebook\hive\udf\UDFNumberRows.java (or any file in that folder for that matter), you may notice that the first line contains
package com.facebook.hive.udf
To keep with the spirit of open source as these files were created by Jonathan Chang at Facebook, we’re keeping the folder convention and compiling it that way so that way we’re always calling it with original naming convention. As noted above, if you’re using maven, the intricacies surrounding the naming / path conventions are taking care for you automatically. In fact, the instructions here were simply a dirty reverse engineering of what maven does for you.
2. Compile the .java file
In keeping with the naming conventions, the paths can be a little lengthy here. Note that my root folder is c:\apps\dist\workspace so the commands below are running from that location. I have placed the com\facebook\hive\udf folder and its contents in the c:\apps\dist\workspace\contrib\src folder. As well, I have manually generated the folder structure c:\apps\dist\workspace\contrib\target\classes where the Java class files will be placed into.
set path=%PATH%;c:\apps\dist\java\bin;
javac -cp c:\apps\dist\hive-0.9.0\lib\hive-exec-0.9.0.jar -d c:\apps\dist\workspace\contrib\target\classes c:\apps\dist\workspace\contrib\src\com\facebook\hive\udf\UDFNumberRows.java
Notice that upon compilation, the file c:\apps\dist\workspace\contrib\target\classes\com\facebook\hive\udf\UDFNumberRows.class has been generated
3. Package these files into jar, since the intent is to be quick-n-dirty, no jar manifest is used
Switch to the contrib folder (c:\apps\dist\workspace\contrib) and run the following command
jar cvf target\UDFNumberRows.jar -C target\classes *
Notice that the class file has been packaged up into c:\apps\dist\workspace\contrib\target\UDFNumberRows.jar
Note that everything up to this point is taken care of by maven if you had the original source code.
4. Run your query using your recently compiled jar.
Note, all of these commands need to be run in the same hive session as you are using a temporary Hive UDF.
— Add the jar you just created and create the temporary function
ADD JAR /apps/dist/workspace/contrib/target/UDFNumberRows.jar;
CREATE TEMPORARY FUNCTION NumRows AS ‘com.facebook.hive.udf.UDFNumberRows’;— Test out your function
create table RankTest as select NumRows(a.clientid), a.clientid from (select clientid from HiveSampleTable order by clientid) a;
Enjoy!
Thanks to Lengning Liu and Keven Tag!