An easy way to test out Hive Dynamic Partition Insert on HDInsight Azure

One you get your HadoopOnAzure.com cluster up and running, an easy way to test out Hive Dynamic Partition Insert (the ability to load data into multiple partitions without the need to load each partition individually) on HDInsight Azure is to use the HiveSampleTable already included and the scripts below. You can execute the scripst from the Hive Interactive Console or from the Hive CLI.

1) For starters, create a new partitioned table

CREATE TABLE hivesampletable_p (
clientid STRING, 
querytime STRING, 
market STRING,
devicemake STRING, 
devicemodel STRING, 
state STRING, 
querydwelltime STRING, 
sessionid BIGINT, 
sessionpagevieworder BIGINT
)
PARTITIONED BY (deviceplatform STRING, country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE

2) Then to insert data into your new partitioned table, run the script below

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
FROM hivesampletable h
INSERT OVERWRITE TABLE hivesampletable_p PARTITION (deviceplatform = 'iPhone OS', country)
SELECT h.clientid, h.querytime, h.market, h.devicemake, h.devicemodel, h.state, h.querydwelltime, h.sessionid, h.sessionpagevieworder, h.country
 WHERE deviceplatform = 'iPhone OS';

Some quick call outs:

  • The first two set statements indicate to Hive that you are running a dynamic partition insert
  • The HiveQL statement populates the hivesampletable_p (that you just created) from the HiveSampleTable.
  • Notice that partition statement has two clauses noting that we are partitioning by deviceplatform and country
  • We have specified deviceplatform = ‘iPhone OS’ indicating that all of this data should only go into the iPhone OS set of partitions.  The where clause ensure that this is being filtered correctly.
  • Also specified is country (with no value) meaning that all country values will have their own partitions as well.

The easiest way to visualize the partitions being created is to Remote Desktop into the name node and open up the Hadoop Name Node Status (the browser link is available on the desktop when you RDP into the name node).  Click Browse the File System > hive > warehouse > hivesampletable_p.

You will notice in the hivesampletable_p, there is a folder called deviceplatform=iPhone%20OS representing the deviceplatform partitioning scheme.  Clicking on the iPhone OS folder you will see multiple folders – one for each country – as noted with the country partitioning scheme.

image

Enjoy!

2 Comments

  1. How is this a dynamic partition if you are specifying the partition key in the INSERT statement?

    1. Oh, I’m only specifying the deviceplatform; the country is the value that is dynamic. HTH!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s