Setup Azure Blob Store for Hadoop on Azure CTP

One of the cool ways to run Hadoop on Azure is to have it connect to Azure Blob storage via your Windows Azure Storage account.  To setup your Azure storage account, please refer to http://windows.azure.com. The tasks below will allow you to setup your Hadoop on Azure CTP account to connect to an existing Azure Blob Storage account using the asv protocol.  For example, within Hadoop, you normally would get a listing of files within HDFS using the command line interface:

hadoop fs –ls /

In the case of accessing files within Azure Blob storage, you can run the command:

hadoop fs –ls asv://<container>/<folder>

The basic steps are:

  1. Obtain the Azure Blobstore Storage Account Name and Access Key.
  2. Set up ASV connection between Hadoop on Azure CTP and your Windows Azure Blob Storage account.
  3. Upload files to your Azure Blob Storage account

1) Obtain the Azure Blobstore Storage Account Name and Access Key

Access your Azure Blobstore Storage account through the Windows Azure Platform dashboard via http://windows.azure.com/.  From here, the navigation path is [Hosted Services, Storage Accounts & CDN] (bottom left) –> [Storage Accounts] (mid-top left).

  • The name blobstore account name is the Storage Account under the subscription as noted within the middle pane.  In this case, I have a storage account called isocatstore.
  • To get the access key, click on the [View] button on the properties right pane after clicking on the storage account in question.

image

 

2) Set up ASV connection between Hadoop on Azure CTP and your Windows Azure Blob Storage account.

From the Hadoop on Azure CTP portal page, click on the [Manage Data] tile.  From here, click on the [Set up ASV] button on the right.

Manage Data

From here, you can supply the credentials of your Azure Blob Storage account that you had obtained in Step 1.

image

Click on [Save Settings] and you are good to go.

 

3) Upload files to your Azure Blob Storage account

A great way to upload files to your Azure Blob Storage account is to use CloudXplorer – you can download it from here: http://clumsyleaf.com/products/cloudxplorer

NOTE: When you upload the files, please ensure to place the files within a folder within a container of your blobstore account.  It is important to do this so that way Hadoop will be able to list all of the files within the folder instead of you needing to access each file individually (which is what would happen if you placed the files directly within the container).

From CloudXplorer, you can quickly create a container and a folder; in this case, I had created the weblog container and the sample folder.

image

Using the intuitive UI, copy your files from your local box to the Azure Blob Storage account.

By doing it in this fashion, you will be able to get a listing of your files from the Hadoop command line interface using the command:

hadoop fs –ls asv://weblog/sample

As well, from the Hadoop on Azure JavaScript Interface, you can view a listing of files using the command

#ls asv://weblog/sample

image

4 thoughts on “Setup Azure Blob Store for Hadoop on Azure CTP

  1. Pingback: Hadoop on Azure: HiveQL query against Azure Blob Storage « Denny Lee

  2. Pingback: Learning about Hadoop on Azure « Really Cool Things

  3. Pingback: Weekly bookmarks: januari 26th | robertsahlin.com

  4. Pingback: Hurricane Sandy Mash-Up: Hive, SQL Server, PowerPivot & Power View - Cindy Gross - SQL Server and Big Data Troubleshooting + Tips - Site Home - MSDN Blogs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s