Sometimes, you have to roll a hard six.
— Commander Adama in Battlestar Galactica “Revelations”
In May of this year, I had noted that I was At a Crossroads … from SSAS to Big Data!. After I had written this post, many people ping me wondering if I had left the world of BI and/or left Microsoft altogether. [Note, I have called out that I AM a Microsoft employee but the opinions here are my own] My subsequent posts ranged from Cloud to a little bit of Analysis Services.
By the way, I did forget to create a separate blog post on the Analysis Services 2008 R2 Performance Guide (fortunately, I did do it on the sqlcat.com site – whew!).
Over time, I became a bit more explicit on the subject of big data including posts like the potential of Big Data and “Hadoop: A movement, not just a technology”. But all of this time, what I was excited about was when we would be able to finally showcase some of our cool stuff including the embracing of Apache HadoopTM – yes, this may sound like marketing speak, but there are good reasons why I’m using it (more later). I even got a little cheeky last week with my recent blog post You know that I’m tired when… – especially with the last two lines
Every time I think about big data, I conjure up proboscidae
Proboscidae is the scientific classification order for elephant – and the icon for Hadoop is a yellow elephant (Doug Cutting named Hadoop after his son’s toy elephant).
I’ve been hearing “go in bar” a lot lately … or perhaps that’s my dyslexia
No, I wasn’t actually thinking about drinking … that much … but the sound of the phrase “go in bar”, if you flip it sounds like “embargo”. That is, until today 10/12 9:00am PST, our work with Hadoop had to be kept quiet or “embargoed”.
Get to the point!
Okay! With today’s Ted Kummert’s Day 1 Keynote of the SQL Server PASS Summit 2011, I had the honor of demonstrating how SQL BI and Hadoop rock together! As you can see from the Port 25 Microsoft, Hadoop, and Big Data and the Microsoft News Center for SQL Server 2012 posts there are a number of cool things that are happening:
- It started with the Hadoop connectors for SQL Server and PDW. Key call out here is that these connectors are bi-directional to allow data movement back and forth between SQL Server and Hadoop.
- Windows Server and Windows Azure optimized Hadoop distributions; out of the box (or cloud), the distributions includes support for HDFS, Hive, Pig-Latin, FTP, etc.
- Our partnership with Hortonworks to help us push forward faster with optimizing Hadoop to run on Windows as noted in their post Bringing Apache Hadoop to Windows.
- As part of the demo today, I showed the integration of the SQL BI stack with Hadoop by having PowerPivot (for Excel and SharePoint) interact with Hadoop for Windows cluster via Hive and the soon to be released HiveODBC driver.
- Not shown today, but just as cool will be the release of the Excel Hive Add-in
More information will be posted at www.microsoft.com/bigdata as it becomes available, eh?!
Cool, so why did I use “embrace Hadoop”?
A key call out during my conversation with Ted during the keynote is that our offering is 100% compatible with Apache Hadoop – if your code works on Apache Hadoop then it will work on ours and vice versa. But, it’s not just about the code, it’s also about this shift that we are embracing the open source community!
Our VB moment in Big Data
So why is Big Data / Hadoop important for a BI dude or dudette?
I’ll probably have a number of posts to for this question alone, but let me give you one answer right now – this is an excerpt from my post: “Hadoop: A movement, not just a technology”
Why am I excited about Hadoop and Big Data even though I’m a Microsoft BI person for most of my career? Because first and foremost, BI is all about making sense of the information. And the greatness of Big Data isn’t just about exploring, understanding, and asking even more questions of this information, but doing it in distribution (vs. silos) and putting more emphasis on the data (i.e. this is where the real IP is)
Any other cool information on Big Data at SQLPASS this week?
Both Ted Kummert and David DeWitt’s keynotes will cover Big Data. If you cannot attend, check out the SQL Server PASS Summit 2011 Live Streaming. As well, there are two breakout sessions on Big Data, both on Thursday:
- AD-216-M: Overview of Big Data on Windows and Windows Azure by Saptak Sen
- BIA-408-A: SQLCAT: Tier-1 BI in the world of Big Data by Thomas Kejser and myself – with special guest Kenneth Lieu from Yahoo!
Also don’t forget that I will be hosting the Big Data table at the Birds of Feather luncheon and a bunch of us will be floating around the Big Data Kiosk in the product pavilion.
Whew! I think that’s it for today!