One of the very exciting thing about Spark is that there is the potential to have one ubiquitous tool to solve my aggregate, machine learning, graph, and other statistical / analytics problems. And while I am proud of my time with the SQL Server team and we had achieved some amazing lofty goals (e.g. Yahoo! 24TB Analysis Services cube), I had been drawn back to my statistical roots. Statistical Roots? It may surprise you that I had been bouncing between the path of becoming a Doctor (…you know, Asian parents) or a statistician (my father was a Statistics professor). …
Tag: Thoughts
Big Data and Legos
I was recently asked the question – how to explain Big Data to an 8yo. So after realizing the 4 Vs of Big Data barely make sense to non-marketing (i.e. most of us) let alone to kids – I realized that the best construct would be to use Legos. When I was her age, the lego blocks were only squares and rectangles – I could build a lot of buildings and boxes which was great at that time (in data speak, relational databases). Instead, Big Data is a massive amount (e.g. volume of data) of lego blocks of different shapes…
Seattle Spark Meetup Kicks Off with DataBricks
I am very excited to that announce that Matei Zaharia and Pat McDonough from DataBricks will be speaking at the Seattle Spark Meetup and we’ve increased the room size to accommodate more people! Seattle Spark Meetup Kick Off with DataBricks They will come up and join us for pizzas and to talk about Apache Spark! I highly encourage you to join the Seattle Spark Meetup for this and other exciting sessions! Below is the abstract of their session as well as their biographies. Introduction to Apache Spark Apache Spark has quickly grown to be one of the most active projects in big data,…
Why all this interest in Spark?
“Spark … is what you might call a Swiss Army knife of Big Data analytics tools” — Reynold Xin (@rxin), Berkeley AmpLab Shark Development Lead The above quote – from the Wired article “Spark: Open Source Superstar Rewrites Future of Big Data” – encompasses why I am a fan of Spark. If you are an avid hiker or outdoors-person, you already appreciate the flexibility of a Swiss Army Knife (or Leatherman). It is the perfect compact tool to do a variety of simple but necessary tasks – bordering on life saving (below is a picture from my ascent to Mount…
In the context of quantum entanglement and time travel – Stargate may be more correct than Star Trek
Feature Image: Michael Bolognesi’s Diamonds in the Sky As a follow up to In the context of quantum entanglement and teleportation – Stargate may be more correct than Star Trek, I’m diving into one of SciFi’s persistent quandaries – time travel. And before anyone gets started, I am a proud Trekkie so this is not meant as a knock on Star Trek. In fact, I’ve already purchased my tickets for Star Trek Into Darkness and as fan of BBC’s Sherlock, I have to admit I’m sort of rooting for the villain this time around! Image source: Benedict Cumberbatch – Star…
In the context of quantum entanglement and teleportation – Stargate may be more correct than Star Trek
While waiting in line at Salumi (Italian so good, there are regularly hour long lines in front of the store), i was watching Nova’s Fabric of the Cosmos: Quantum Leap. In the process, I just realized that w/ the concept of teleportation – Stargate is more correct than Star Trek! Please note, I am at best a kindergarten novice quantum physicist – I just read a bunch of quantum physics books and watch PBS and Discovery when I’m not watching SciFi The basic fundamentals here of quantum entanglement are best left to Brian Greene in the The Fabric of the…
What are thou Big Data? Asked the SQLBI Arbiter
Over the last few days, I’ve been pinged the question: What is Big Data? Go figure, I actually have an answer of sorts – from a SQL BI perspective (since that’s my perspective, eh?!) Above the cloud: Big Data and BI from Denny Lee There are two blog posts that go with the above slides that provide the details. Concerning the concepts of Scaling Up or Scaling Out, check out Scale Up or Scale Out your Data Problems? A Space Analogy. Concerning the concepts of data movement, check out Moving data to compute or compute to data? That is the…
A Quick HBase Primer from a SQLBI Perspective
One of the questions I’m often asked – especially from a BI perspective – is how a BI person should look at HBase. After all, HBase is often described quickly as an in-memory column store database – isn’t that what SSAS Tabular is? Yet calling HBase an in-memory column store database isn’t quite right because in this case, the terms column, database, tables, and rows do not quite mean the same thing as one would think from a relational database aspect of things. Setting the Context How I usually start off is by providing a completely different context before I…