Hadoop on Azure: HiveQL query against Azure Blob Storage

The posting Setup Azure Blob Store for Hadoop on Azure CTP provides a quick way to upload files to your Azure Blob storage account and connect Hadoop on Azure CTP to it.  Now that you have done that, one of the first things you may want to do is to interact with the data.

To do this, let’s create a Hive table within Hadoop on Azure CTP that is connected to the files you uploaded to your Azure Blob storage account and query it.  We will be referencing the scenario noted at: Hadoop on Azure Scenario: Query a web log via HiveQL

The tasks we will be performing are:

  1. Setup Azure Blob Store for Hadoop on Azure CTP
  2. Create a Hive table referencing the files in the Azure Blob Storage account
  3. Execute a simple query

1) Setup Azure Blob Store for Hadoop on Azure CTP

To do this, please refer to Setup Azure Blob Store for Hadoop on Azure CTP

.

2) Create a Hive table referencing the files in the Azure Blob Storage account

Following the Hadoop on Azure Scenario: Query a web log via HiveQL scenario

  • Go to the Hadoop on Azure Interactive Hive Console
  • Create a Hive table using the statement below

CREATE EXTERNAL TABLE weblog_sample_asv (
evtdate STRING,
evttime STRING,
svrsitename STRING,
svrip STRING,
csmethod STRING,
csuristem STRING,
csuriquery STRING,
svrport INT,
csusername STRING,
cip STRING,
UserAgent STRING,
Referer STRING,
scstatus STRING,
scsubstatus STRING,
scwin32status STRING,
scbytes STRING,
csbytes STRING,
timetaken STRING
)
COMMENT ‘This is a web log sample ASV’
ROW FORMAT DELIMITED FIELDS TERMINATED by ’32’
STORED AS TEXTFILE
LOCATION ‘asv://weblog/sample’;

Note that the only difference between the original HiveQL script (which goes to HDFS) and the one that goes to the Azure Blob storage is the highlighted LOCATION statement using the asv protocol.

NOTE: As noted in Setup Azure Blob Store for Hadoop on Azure CTP, we are using the protocol of asv://<container>/<folder> so that way its possible for Hadoop to view any and all files uploaded to the sample folder.

image

 

3. Execute a simple query

Now that you have created a Hive EXTERNAL table that points to the files located in the weblog/sample folder of your Azure Blob storage account, you can now query it.

The query below is the result from:

select * from weblog_sample_asv limit 10;

image

One thought on “Hadoop on Azure: HiveQL query against Azure Blob Storage

  1. Pingback: Using Avro with HDInsight on Azure at 343 Industries | Denny Lee

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s