This post was published March 1st, 2012 and the content may be obsolete. Thus, most of these links are no longer active but I am keeping this post for posterity.
As noticed by GigaOM’s article Microsoft’s Hadoop play is shaping up, and it includes Excel; the great call out is:
Big Data for Everyone!
The title of the Microsoft BI blog post says it the best: Big Data for Everyone: Using Microsoft’s Familiar BI Tools with Hadoop – it’s about helping make Big Data accessible to everyone by use of one of the most popular and powerful BI tools – Excel.
So what does accessible to everyone mean – in the BI sense? It’s about being to go from this (which is a pretty nice view of Hive query against Hadoop on Azure Hive Console)
and getting it Excel or PowerPivot.
The most important call out here is that you can use PowerPivot and Excel to merge data sets not just from Hadoop, but also bring in data sets from SQL Server, SQL Azure, PDW Oracle, Teradata, Reports, Atom feeds, Text files, other Excel files, and via ODBC – all within Excel! (thanks @sqlgal for that reminder!)
From here users can manipulate the data using Excel macros and PowerPivot DAX language respectively. Below is a screenshot of data extracted from Hive and placed into PowerPivot for Excel.
But even more cooler – data visualization wise – your PowerPivot for Excel workbook (once uploaded to SharePoint 2010 with SQL Server 2012) and you can create an interactive Power View report.
For more information on how to get PowerPivot and Power View to connect to Hadoop (in this case, its Hadoop on Azure but conceptually they are the same), please reference the links below:
- How To Connect Excel to Hadoop on Azure via HiveODBC
- Connecting PowerPivot to Hadoop on Azure – Self Service BI to Big Data in the Cloud
- Connecting Power View to Hadoop on Azure
- Connecting Power View to Hadoop on Azure [Video]
So what’s so Big about Big Data?
As noted by in the post What’s so Big about Big Data?, we call out that Big Data is important because of the sheer amount of machine generated data that needs to be made sense of.
As noted by Alexander Stojanovic (@stojanovic), the Founder and General Manager of Hadoop on Windows and Azure:
It’s not just your “Big Data” problems, it’s about your BIG “Data Problems”
To learn more, check out the my 24HOP (24 Hours of PASS) session:
Tier-1 BI in the Age of Bees and Elephants
In this age of Big Data, data volumes become exceedingly larger while the technical problems and business scenarios become more complex. This session dives provides concrete examples of how these can be solved. Highlighted will be the use of Big Data technologies including Hadoop (elephants) and Hive (bees) with Analysis Services. Customer examples including Klout and Yahoo! (with their 24TB cube) will highlight both the complexities and solutions to these problems.
Making this real, a great case study showcasing this includes the one at Klout, which includes a great blog post: Big Data, Bigger Brains. And below is a link to Bruno Aziza (@brunoaziza) and Dave Mariani’s (@dmariani) YouTube video on how Klout Leverages Hadoop and Microsoft BI Technologies To Manage Big Data.
Disclaimer: This blog post (like other blog posts on dennyglee.com) are written by the author Denny Lee. I am a Microsoft employee but the opinions below are my own. I have been working with the Isotope team (code name for Hadoop on Windows and Hadoop on Azure) since its inception while part of the SQL Customer Advisory Team.