If you’ve joined the HDInsight Preview – you will notice many new changes including the tight integration with Windows Azure and that HDInsight defaults to ASV. As noted in Why use Blob Storage with HDInsight on Azure, there are some interesting technical (performance) and business reasons for utilizing Azure storage accounts. But if you had been playing with the HadoopOnAzure.com beta and switched over to the Windows Azure HDInsight Service Preview – you’ll may have noticed a quick change in the way asv paths work. Here’s a quick cheat sheet for you.
In general, to access ASV sources
#ls asv://$container$@$storage_account$.blob.core.windows.net/$path$
The exception is the default container which was created when you originally setup your cluster. For example, my storage account is “doctorwho” and the container (which is the name of my HDInsight cluster) is “caprica” (Yes, I’m mixing Battlestar Galactica and Doctor Who – deal with it!):
#ls asv://caprica@doctorwho.blob.core.windows.net/
Yet because this is also the default container / storage account, you can also just go:
#ls /
If you want to access another container in the same storage account, you’ll have to specify the entire statement. For example, if I wanted to access the rainier container, muir folder in my doctorwho account
#ls asv://rainier@doctorwho.blob.core.windows.net/muir
As well, if you want to access a completely separate storage account, provided you have specified the account information within the core-site.xml (more info below), then you can follow the same path. For example, if I wanted to access the ultimate container, frisbee folder in my riversong account:
#ls asv://ultimate@riversong.blob.core.windows.net/frisbee
Note, for the above to work, you will need to modify your core-site.xml and add a fs.azure.account.key.$full account path$ – the template would look like:
<property>
<name>fs.azure.account.key.$account$.blob.core.windows.net</name>
<value>$account-key$</value>
</property>
For my riversong account, it would look like:
<property>
<name>fs.azure.account.key.riversong.blob.core.windows.net</name>
<value>$riversong-account-key$</value>
</property>
Enjoy!
Twice in the last week my day has gone something like this: I think to myself “I need this…” and your blog say “here you go….” Thanks.
Thanks Brian 🙂
Thank you for the help 😉 I have one question about the port 10000, how can i open it in the Windows Azure HDInsight Service Preview ? Regards
Sorry i have found, we have to use the port 563 😉
🙂
Clear and concise, I like it! Keep writing… 🙂
Thanks!
[…] setting up your cluster. If you want to access more than one container I advise you to read the excellent post by Denny Lee. But wait, isn't Hadoop all about moving compute to data vs. traditionally moving data to compute, […]
[…] Updated HDInsight on Azure ASV paths for multiple storage accounts https://dennyglee.com/2013/03/25/updated-hdinsight-on-azure-asv-paths-for-multiple-storage-accounts/ […]
It seems that now we should use ‘wasb:’ instead of ‘asv:’… this is really confusing.
Yes, that is the case now (to use wasb: vs. asv). I had written this article when we were initially designing HDInsight and at that time, we had called it ASV. Sorry for the confusion!