Sometimes, you have to roll a hard six.
— Commander Adama in Battlestar Galactica “Revelations”
In May of this year, I had noted that I was At a Crossroads … from SSAS to Big Data!. After I had written this post, many people ping me wondering if I had left the world of BI and/or left Microsoft altogether. [Note, I have called out that I AM a Microsoft employee but the opinions here are my own] My subsequent posts ranged from Cloud to a little bit of Analysis Services.
By the way, I did forget to create a separate blog post on the Analysis Services 2008 R2 Performance Guide (fortunately, I did do it on the sqlcat.com site – whew!).
Over time, I became a bit more explicit on the subject of big data including posts like the potential of Big Data and “Hadoop: A movement, not just a technology”. But all of this time, what I was excited about was when we would be able to finally showcase some of our cool stuff including the embracing of Apache HadoopTM – yes, this may sound like marketing speak, but there are good reasons why I’m using it (more later). I even got a little cheeky last week with my recent blog post You know that I’m tired when… – especially with the last two lines
Every time I think about big data, I conjure up proboscidae
Proboscidae is the scientific classification order for elephant – and the icon for Hadoop is a yellow elephant (Doug Cutting named Hadoop after his son’s toy elephant).
I’ve been hearing “go in bar” a lot lately … or perhaps that’s my dyslexia
No, I wasn’t actually thinking about drinking … that much … but the sound of the phrase “go in bar”, if you flip it sounds like “embargo”. That is, until today 10/12 9:00am PST, our work with Hadoop had to be kept quiet or “embargoed”.
Get to the point!
Okay! With today’s Ted Kummert’s Day 1 Keynote of the SQL Server PASS Summit 2011, I had the honor of demonstrating how SQL BI and Hadoop rock together! As you can see from the Port 25 Microsoft, Hadoop, and Big Data and the Microsoft News Center for SQL Server 2012 posts there are a number of cool things that are happening:
- It started with the Hadoop connectors for SQL Server and PDW. Key call out here is that these connectors are bi-directional to allow data movement back and forth between SQL Server and Hadoop.
- Windows Server and Windows Azure optimized Hadoop distributions; out of the box (or cloud), the distributions includes support for HDFS, Hive, Pig-Latin, FTP, etc.
- Our partnership with Hortonworks to help us push forward faster with optimizing Hadoop to run on Windows as noted in their post Bringing Apache Hadoop to Windows.
- As part of the demo today, I showed the integration of the SQL BI stack with Hadoop by having PowerPivot (for Excel and SharePoint) interact with Hadoop for Windows cluster via Hive and the soon to be released HiveODBC driver.
- Not shown today, but just as cool will be the release of the Excel Hive Add-in
More information will be posted at www.microsoft.com/bigdata as it becomes available, eh?!
Cool, so why did I use “embrace Hadoop”?
A key call out during my conversation with Ted during the keynote is that our offering is 100% compatible with Apache Hadoop – if your code works on Apache Hadoop then it will work on ours and vice versa. But, it’s not just about the code, it’s also about this shift that we are embracing the open source community!
Our VB moment in Big Data
So why is Big Data / Hadoop important for a BI dude or dudette?
I’ll probably have a number of posts to for this question alone, but let me give you one answer right now – this is an excerpt from my post: “Hadoop: A movement, not just a technology”
Why am I excited about Hadoop and Big Data even though I’m a Microsoft BI person for most of my career? Because first and foremost, BI is all about making sense of the information. And the greatness of Big Data isn’t just about exploring, understanding, and asking even more questions of this information, but doing it in distribution (vs. silos) and putting more emphasis on the data (i.e. this is where the real IP is)
Any other cool information on Big Data at SQLPASS this week?
Both Ted Kummert and David DeWitt’s keynotes will cover Big Data. If you cannot attend, check out the SQL Server PASS Summit 2011 Live Streaming. As well, there are two breakout sessions on Big Data, both on Thursday:
- AD-216-M: Overview of Big Data on Windows and Windows Azure by Saptak Sen
- BIA-408-A: SQLCAT: Tier-1 BI in the world of Big Data by Thomas Kejser and myself – with special guest Kenneth Lieu from Yahoo!
Also don’t forget that I will be hosting the Big Data table at the Birds of Feather luncheon and a bunch of us will be floating around the Big Data Kiosk in the product pavilion.
Whew! I think that’s it for today!
[…] Microsoft will be offering its own Hadoop distribution on Windows; MS have forged a partnership with HortonWorks to do this. I guess this means the end for Dryad/LINQ to HPC as a product, but it’s a good decision – the market doesn’t want another MS me-too product, it wants Hadoop. There will also be an ODBC driver and addin for Excel for Apache Hive, so you will be able to get data from Hadoop directly into PowerPivot and SSAS Tabular without having to stage it in a relational database. It’ll be available as an on-premises solution and also there’ll be a CTP of an Azure-based solution by the end of the year. This is today’s first big announcement, clearly. I have a few customers with the kind of data volumes that mean they’ll be interested in this, especially now it’s coming in a friendly, MS-packaged format. Denny Lee has more details on all this here. […]
This is great.
do you feel this will end the dryad work that has been completed?
That’s a very good question and I hope my post Hadoop vs. and HPC (https://dennyglee.com/2011/10/24/hadoop-vs-and-hpc/) helps to address this. The quick answer is “No” – but of course note that this is my personal opinion :).
[…] Revelations – rolling the hard six to SQL BI and Hadoop post of 11/12/2011 provides more information on Apache Hadoop in SQL Azure and SQL Server: Okay! […]
[…] on the bandwagon of Hadoop and Big Data? Well, if you read some of my previous blog posts like Revelations – rolling the Hard Six to SQL BI and Hadoop or At a Crossroads … from SSAS to BigData! – I never left the Big Data wagon in the first […]
[…] Revelations – rolling the hard six to SQL BI and Hadoop […]