As noted in my previous post Connecting Hadoop on Azure to your Amazon S3 Blob storage, you could easily setup HDInsight Azure to go against your Amazon S3 / S3N storage. With the updates to HDInsight, you’ll notice that Manage Cluster dialog no longer includes the quick access to Set up S3.
Yet, there are times where you may want to connect your HDInsight cluster to access your S3 storage. Note, this can be a tad expensive due to transfer costs.
To get S3 setup on your Hadoop cluster, from the HDInsight dashboard click on the Remote Desktop tile so you can log onto the name node.
Once you are logged in, open up the Hadoop Command Line Interface link from the desktop.
From here, switch to the c:\apps\dist\Hadoop\conf folder and edit the core-site.xml file. The code to add is noted below.
<value>[Access Key ID]</value>
<value>[Secret Access Key]</value>
Once this is setup, you will be able to access your S3 account from your Hadoop cluster.
Think you have a couple of extra “Oh”s in your title. It’s from “Where, O where has my little dog gone. Where, o where can he be?’
LOL – fair enough, but I still prefer the extra O’s 😉