seattle spark large

To Spark … and Beyond!

One of the very exciting thing about Spark is that there is the potential to have one ubiquitous tool to solve my aggregate, machine learning, graph, and other statistical / analytics problems.  And while I am proud of my time with the SQL Server team and we had achieved some amazing lofty goals (e.g. Yahoo! 24TB Analysis Services cube), I had been drawn back to my statistical roots. Statistical Roots? It may surprise you that I had been bouncing between the path of becoming a Doctor (…you know, Asian parents) or a statistician (my father was a Statistics professor).  …

Rate this:

spark logo

Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8)

Spark is an in-memory open source cluster computing system allowing for fast iterative and interactive analytics.  Spark utilizes Scala – a type-safe objected oriented language with functional properties that is fully interoperable with Java.  For more information about Spark, please refer to  To test out Spark, you can install the stand-alone version on Mac OSX. Install Scala 2.9.2 The first thing you will need to do is to install Scala 2.9.2 as Spark 0.6.1 is dependent on it.  As of this posting, the current version of Scala is 2.10 but there are some issues with Spark 0.6.1 and Scala…

Rate this:


Big Data, BI, and Compliance in Healthcare

If you’re interested in Big Data, BI, and Compliance in Healthcare; check out Ayad Shammout (@aashammout) and my 24 Hours of PASS (Spring 2013) session Ensuring Compliance of Patient Data with Big Data and BI. To help meet HIPAA and HealthAct compliance – and to more easily handle larger volumes of unstructured data and gain richer and deeper insight using the latest analytics – a medical center is embarking on a Big Data-to-BI project involving HDInsight, SQL Server 2012, Integration Services, PowerPivot, and Power View. Join this preview of Denny Lee and Ayad Shammout’s PASS Business Analytics Conference session to…

Rate this:


Yahoo! 24TB SSAS Big Data Case Study + Slides

In my post from last year, I had asked the rhetorical question What’s so BIG about “Big Data”.  I had the honor of announcing the largest known Analysis Services cube – at 24TB – in which the source of this cube is 2PB from a huge Hadoop cluster. For those whom had attended the PASS 2011 session Tier-1 BI in the world of Big Data, Thomas Kejser, myself, and Kenneth Lieu were honored to discuss the details surrounding this uber-cube.   At that time, I had promised the case study would only be months away… Alas, it took a little while…

Rate this:


SQL BI Professionals – Let your voice be heard!

As recently announced by T.K. Anand on the Analysis Services and PowerPivot, the SQL BI development team are looking for your thoughts and opinions on the tools and capabilities for SQL Server Analysis Services Development. If you want your voice to be heard – now is a great time to do it!  Follow through with the BI Professional Survey for more information and link to the survey. The voice of the SQL community has a huge impact on how and what the SQL BI team develops.   After all, it was due to strong SQL BI community feedback that helped push…

Rate this:


SQLPASS: Hadoop and BI are better together–we’ll show you how!

This is going to be yet another exciting SQLPASS Summit!  Lots of great sessions, focus groups, and craziness – definitely one of the funner times of the job! Over the last few years, I have had the honor to speak at the SQLPASS summits with the various themes surrounding large volume / complex Enterprise Business Intelligence. Last year I had the honor to introduce Hadoop on Azure and Windows during the 2011 PASS Summit Keynote with Ted Kummert.  As it was recently announced during this year’s Strata Conference, Hortonworks Powers Microsoft HDInsight. For this year’s PASS Summit, Dave Mariani (@dmariani)…

Rate this:


Enterprise Reporting Services Jump Start Guide

While there is a lot of deserved and great interest with Power View, almost forgotten is the IT workhorse SQL Server 2012 Reporting Services for your corporate / managed reporting infrastructure.  For great references on SQL Server 2012 Reporting Services, my suggestions include: What’s New In SQL Server 2012 Reporting Services SQL Server Reporting Services Team Blog Robert Bruckner’s Reporting Services and Power View Blog – Principal Architect within SQL Server Reporting with excellent insight into both Reporting Services and Power View. In my former life, I had worked on some pretty complex Reporting Services customer projects.  While these references…

Rate this: