When working with Hadoop on Azure, you may be used to the idea of putting your data in the Cloud. In addition to using Azure Blob Storage, another option is connecting your Hadoop on Azure cluster to query data against Amazon S3. To configure Hadoop on Azure to connect to it, below are the steps (with the presumption that you already have an Amazon AWS / S3 account) and have uploaded data into your S3 account.
1) Log into your Amazon AWS Account and click onto Security Credentials
2) Obtain your access credentials – you’ll need both your Access Key ID and Secret Access Key.
3) From here, log into your Hadoop on Azure account, click the Manage Cluster live tile, and click on Set up S3. From here, enter your Access Key and Secret Key and click Save Settings.
4) Once you have successfully saved your Amazon S3 settings, you can access your Amazon S3 files from Hadoop on Azure. For example, I have a bucket called tardis6 with folder weblog with a sample weblog file.