In the last few weeks, I have had a number of customers ping me about how to utilize various Hive UDFs. The first ask was how to use some of the UDFs that are already built into Hive. For example, if you wanted a generated row sequence number (i.e. an IDENTITY column), you can use the Hive UDF UDFRowSequence. This UDF is already built and included in the hive-contrib-0.9.0.jar that is not already loaded in the distributed cache (run list jars from the Hive CLI to verify). Below is a quick code snippet that allows you to run the generated row sequence by accessing the UDFRowSequence Hive UDF.
– Compression codecs (not necessary for Hive UDFs, just habit)
– Drop the table
DROP TABLE hivesampletable_rowseq;
– Add the built-in Hive jar that contains the Hive UDF UDFRowSequence
ADD JAR /apps/dist/hive-0.9.0/lib/hive-contrib-0.9.0.jar;
– Create a function that references UDFRowSequence that can be used within the Hive statement directly
CREATE TEMPORARY FUNCTION rowSequence AS ‘org.apache.hadoop.hive.contrib.udf.UDFRowSequence’;
– Running the Hive statement with rowSequence() call
CREATE TABLE hivesampletable_rowseq
rowSequence() as lid,
Note that running the ADD JAR statement still uses Unix paths (even though we’re on Windows) which references the local file path (vs. HDFS or ASV).