Ramblings of a data dork: from BI and Big Data to Travel and Food
The posting Setup Azure Blob Store for Hadoop on Azure CTP provides a quick way to upload files to your Azure Blob storage account and connect Hadoop on Azure CTP to it. Now that you have done that, one of the first things you may want to do is to interact with the data.
To do this, let’s create a Hive table within Hadoop on Azure CTP that is connected to the files you uploaded to your Azure Blob storage account and query it. We will be referencing the scenario noted at: Hadoop on Azure Scenario: Query a web log via HiveQL
The tasks we will be performing are:
1) Setup Azure Blob Store for Hadoop on Azure CTP
To do this, please refer to Setup Azure Blob Store for Hadoop on Azure CTP
2) Create a Hive table referencing the files in the Azure Blob Storage account
Following the Hadoop on Azure Scenario: Query a web log via HiveQL scenario
CREATE EXTERNAL TABLE weblog_sample_asv (
COMMENT ‘This is a web log sample ASV’
ROW FORMAT DELIMITED FIELDS TERMINATED by ’32′
STORED AS TEXTFILE
Note that the only difference between the original HiveQL script (which goes to HDFS) and the one that goes to the Azure Blob storage is the highlighted LOCATION statement using the asv protocol.
NOTE: As noted in Setup Azure Blob Store for Hadoop on Azure CTP, we are using the protocol of asv://<container>/<folder> so that way its possible for Hadoop to view any and all files uploaded to the sample folder.
3. Execute a simple query
Now that you have created a Hive EXTERNAL table that points to the files located in the weblog/sample folder of your Azure Blob storage account, you can now query it.
The query below is the result from:
select * from weblog_sample_asv limit 10;