Getting your Pig to eat ASV blobs in Windows Azure HDInsight

Recently I was asked how could I get my Pig scripts to access files stored in Azure Blob Storage through the command line prompt.  While it is possible to do this from HDInsight Interactive JavaScript console, to automate scripts and use the grunt interactive shell, it is easier to run these commands from the command line.  To do this, you will need to:

  • Ensure your HDInsight Azure cluster is connected to Azure Blob Storage subscription / account
  • Familiarize yourself with the pig / grunt interactive shell

Connecting HDInsight Azure to Azure Blob Storage

1) To do this, go to the Manage Cluster live tile as noted in the screenshot below

image

2) Click on Set up ASV to place in your Azure Blob Storage account information.

image

3) Specify the Azure storage account and passkeys as noted below.

image

And now, you’ve connected your Azure Blob Storage account to your HDInsight Azure cluster.

Accessing the Pig/Grunt Interactive Shell

To access your Pig/Grunt interactive shell, from the Metro interface:

1) Click on the Remote Desktop live tile.

image

2) Once you’ve logged in, click on the Hadoop Command Line shortcut located in the top left corner

image

3) From the Hadoop command line shell, switch to the pig folder and execute the pig.cmd:

cd c:\apps\dist\pig-{version}-SNAPSHOT folder
bin\pig

and now you’re able to grunt!

image

Quick Pig / Grunt Example

Now that you’re working with the grunt interactive console, you can run some simple Pig commands.  For example, the example below allows you to load and view the first line and “schema” of the the logs you had loaded.

A = LOAD 'asv://weblog/sample/' USING TextLoader as (line:chararray);
illustrate A;

The results of the illustrate command are noted below:

2012-10-27 20:27:20,266 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at:
...

------------------------------------------------------------------------

...

| ASV_LOGS     | line:chararray

...

|              | 2012-12-14 19:26:31 W3SVCPING 10.0.0.0 GET /Cascades/atom.aspx - 80 - 10.0.0.24 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1

.4322;+MSOffice+12) - 200 0 0 144532 484 1124 |

------------------------------------------------------------------------

...

Enjoy!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s