Optimizing Joins running on HDInsight Hive on Azure at GFS

.“…to look at the stars always makes me dream, as simply as I dream over the black dots of a map representing towns and villages…” — Vincent Van Gogh Image Source: Vincent Van Gogh Painting Tilt Shifted: http://coolvibe.com/2011/16-van-gogh-paintings-tilt-shifted/tilt-shift-van-gogh-15/ . Introduction To analyze hardware utilization within their data centers, Microsoft’s Online Services Division – Global Foundation Services (GFS) is working with Hadoop / Hive via HDInsight on Azure.  A common scenarios is to perform joins between the various tables of data.  This quick blog post provides a little context on how we managed take a query from >2h to <10min and…

Rate this:


Using Avro with HDInsight on Azure at 343 Industries

By Michael Wetzel, Tamir Melamed, Mark Vayman, Denny Lee Reviewed by Pedro Urbina Escos, Brad Sarsfield, Rui Martins Thanks to Krishnan Kaniappan, Che Chou, Jennifer Yi, and Rob Semsey As noted in the Windows Azure Customer Solution Case Study, Halo 4 developer 343 Industries Gets New User Insights from Big Data in the Cloud, a critical component to achieve faster Hadoop query and processing performance AND keep file sizes small (thus Azure storage savings, faster query performance, and reduced network overhead) was to utilize Avro sequence files. Avro was designed for Hadoop to help make Hadoop more interoperable with other…

Rate this:


An easy way to test out Hive Dynamic Partition Insert on HDInsight Azure

One you get your HadoopOnAzure.com cluster up and running, an easy way to test out Hive Dynamic Partition Insert (the ability to load data into multiple partitions without the need to load each partition individually) on HDInsight Azure is to use the HiveSampleTable already included and the scripts below. You can execute the scripst from the Hive Interactive Console or from the Hive CLI. 1) For starters, create a new partitioned table CREATE TABLE hivesampletable_p ( clientid STRING, querytime STRING, market STRING, devicemake STRING, devicemodel STRING, state STRING, querydwelltime STRING, sessionid BIGINT, sessionpagevieworder BIGINT ) PARTITIONED BY (deviceplatform STRING, country…

Rate this:


Oh where, oh where did my S3N go? (in Windows Azure HDInsight) Oh where, Oh where, can it be?!

As noted in my previous post Connecting Hadoop on Azure to your Amazon S3 Blob storage, you could easily setup HDInsight Azure to go against your Amazon S3 / S3N storage.  With the updates to HDInsight, you’ll notice that Manage Cluster dialog no longer includes the quick access to Set up S3. Yet, there are times where you may want to connect your HDInsight cluster to access your S3 storage.  Note, this can be a tad expensive due to transfer costs. To get S3 setup on your Hadoop cluster, from the HDInsight dashboard click on the Remote Desktop tile so…

Rate this:

Office 2013 Power View, Bing Maps, Hive, and Hadoop on Azure … on my!

With all the excitement surrounding Office 2013 (here’s a nice Engadget Review) and energized by Andrew Brust’s tweets (@andrewbrust) and post Office 2013 brings BI, Big Data to Windows 8 tablets, I thought I would expand on my  posts: Connecting Power View to Hadoop on Azure Connecting PowerPivot to Hadoop on Azure – Self Service BI to Big Data in the Cloud For us involved in BI, the excitement surrounding Office 2013 is because Power View is now embedded directly in Excel.  But in addition to that, now I can include maps!  Yay! So to make my Power View to…

Rate this:

A Primer on Hadoop (from the Microsoft SQL Community perspective)

For a quick primer on Hadoop (from the perspective of the Microsoft SQL Community), as well as Microsoft Hadoop on Azure and Windows, check out the SlideShare.NET presentation below. Above the cloud: Big Data and BI View more PowerPoint from Denny Lee Note, as well, there is a great end-to-end Microsoft Hadoop on Azure and Windows presentation available at: Apache hadoop for windows server and windwos azure View more PowerPoint from Brad Sarsfield

Rate this:

Connecting Power View to Hadoop on Azure–An #awesomesauce way to view Big Data in the Cloud

The post Connecting PowerPivot to Hadoop on Azure – Self Service BI to Big Data in the Cloud provided the step-by-step details on how to connect PowerPivot to your Hadoop on Azure cluster.   And while this is really powerful, one of the great features as part of SQL Server 2012 is Power View (formerly known as Project Crescent).  With Power ‘View, the SQL Server BI stack extends the concept of Self Service BI (PowerPivot) to Self service Reporting. Above is a screenshot of the Power View Mobile Hive Sample that is built on top of the PowerPivot workbook created in…

Rate this: