One you get your HadoopOnAzure.com cluster up and running, an easy way to test out Hive Dynamic Partition Insert (the ability to load data into multiple partitions without the need to load each partition individually) on HDInsight Azure is to use the HiveSampleTable already included and the scripts below. You can execute the scripst from the Hive Interactive Console or from the Hive CLI.
1) For starters, create a new partitioned table
CREATE TABLE hivesampletable_p ( clientid STRING, querytime STRING, market STRING, devicemake STRING, devicemodel STRING, state STRING, querydwelltime STRING, sessionid BIGINT, sessionpagevieworder BIGINT ) PARTITIONED BY (deviceplatform STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE
2) Then to insert data into your new partitioned table, run the script below
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; FROM hivesampletable h INSERT OVERWRITE TABLE hivesampletable_p PARTITION (deviceplatform = 'iPhone OS', country) SELECT h.clientid, h.querytime, h.market, h.devicemake, h.devicemodel, h.state, h.querydwelltime, h.sessionid, h.sessionpagevieworder, h.country WHERE deviceplatform = 'iPhone OS';
Some quick call outs:
- The first two set statements indicate to Hive that you are running a dynamic partition insert
- The HiveQL statement populates the hivesampletable_p (that you just created) from the HiveSampleTable.
- Notice that partition statement has two clauses noting that we are partitioning by deviceplatform and country
- We have specified deviceplatform = ‘iPhone OS’ indicating that all of this data should only go into the iPhone OS set of partitions. The where clause ensure that this is being filtered correctly.
- Also specified is country (with no value) meaning that all country values will have their own partitions as well.
The easiest way to visualize the partitions being created is to Remote Desktop into the name node and open up the Hadoop Name Node Status (the browser link is available on the desktop when you RDP into the name node). Click Browse the File System > hive > warehouse > hivesampletable_p.
You will notice in the hivesampletable_p, there is a folder called deviceplatform=iPhone%20OS representing the deviceplatform partitioning scheme. Clicking on the iPhone OS folder you will see multiple folders – one for each country – as noted with the country partitioning scheme.
How is this a dynamic partition if you are specifying the partition key in the INSERT statement?
Oh, I’m only specifying the deviceplatform; the country is the value that is dynamic. HTH!