Ramblings of a data dork: from BI and Big Data to Travel and Food
One of the cool ways to run Hadoop on Azure is to have it connect to Azure Blob storage via your Windows Azure Storage account. To setup your Azure storage account, please refer to http://windows.azure.com. The tasks below will allow you to setup your Hadoop on Azure CTP account to connect to an existing Azure Blob Storage account using the asv protocol. For example, within Hadoop, you normally would get a listing of files within HDFS using the command line interface:
hadoop fs –ls /
In the case of accessing files within Azure Blob storage, you can run the command:
hadoop fs –ls asv://<container>/<folder>
The basic steps are:
1) Obtain the Azure Blobstore Storage Account Name and Access Key
Access your Azure Blobstore Storage account through the Windows Azure Platform dashboard via http://windows.azure.com/. From here, the navigation path is [Hosted Services, Storage Accounts & CDN] (bottom left) –> [Storage Accounts] (mid-top left).
2) Set up ASV connection between Hadoop on Azure CTP and your Windows Azure Blob Storage account.
From the Hadoop on Azure CTP portal page, click on the [Manage Data] tile. From here, click on the [Set up ASV] button on the right.
From here, you can supply the credentials of your Azure Blob Storage account that you had obtained in Step 1.
Click on [Save Settings] and you are good to go.
3) Upload files to your Azure Blob Storage account
A great way to upload files to your Azure Blob Storage account is to use CloudXplorer – you can download it from here: http://clumsyleaf.com/products/cloudxplorer
NOTE: When you upload the files, please ensure to place the files within a folder within a container of your blobstore account. It is important to do this so that way Hadoop will be able to list all of the files within the folder instead of you needing to access each file individually (which is what would happen if you placed the files directly within the container).
From CloudXplorer, you can quickly create a container and a folder; in this case, I had created the weblog container and the sample folder.
Using the intuitive UI, copy your files from your local box to the Azure Blob Storage account.
By doing it in this fashion, you will be able to get a listing of your files from the Hadoop command line interface using the command:
hadoop fs –ls asv://weblog/sample