Data Exploration with Databricks

Today, it was also featured on InsideBigData: Data Exploration with Databricks.  Awesome!   This Data Exploration on Databricks jump start video will show you how go from data source to visualization in a few easy steps. Specifically, we will take semi-structured logs, easily extract and transform them, analyze and visualize the data using Spark SQL, so we can quickly understand our data. For more information and to check out other Spark notebooks, check out Selected Notebooks > Databricks Jump Start.  

Rate this:

Quick Tip for extracting SQL Server data to Hive

While I have documented various techniques to transfer data from Hadoop to SQL Server / Analysis Services (e.g. How Klout changed the landscape of social media with Hadoop and BI Slides Updated, SQL Server Analysis Services to Hive, etc.), this post calls out the reverse – how to quickly extract SQL Server data to Hadoop / Hive.   This is a common scenario where SQL Server is being used as your transactional store and you want to push data to some other repository for analysis where you are mashing together semi-structured and structured data. How to minimize impact on SQL Server…

Rate this:

What are thou Big Data? Asked the SQLBI Arbiter

Over the last few days, I’ve been pinged the question: What is Big Data? Go figure, I actually have an answer of sorts – from a SQL BI perspective (since that’s my perspective, eh?!) Above the cloud: Big Data and BI from Denny Lee There are two blog posts that go with the above slides that provide the details. Concerning the concepts of Scaling Up or Scaling Out, check out Scale Up or Scale Out your Data Problems? A Space Analogy. Concerning the concepts of data movement, check out Moving data to compute or compute to data? That is the…

Rate this:

A Quick HBase Primer from a SQLBI Perspective

One of the questions I’m often asked – especially from a BI perspective – is how a BI person should look at HBase.  After all, HBase is often described quickly as an in-memory column store database – isn’t that what SSAS Tabular is?   Yet calling HBase an in-memory column store database isn’t quite right because in this case, the terms column, database, tables, and rows do not quite mean the same thing as one would think from a relational database aspect of things. Setting the Context How I usually start off is by providing a completely different context before I…

Rate this:

SQL BI at Hadoop Summit = Awesomesauce!

For the 2012 Hadoop Summit, I will have the honor to co-present with Dave Mariani (@dmariani) from Klout in our session How Klout is changing the landscape of social media with Hadoop and BI. Our session is currently scheduled for June 13th at 3:35pm but it is subject to change.  Check out our session and some pretty amazing others on the Hadoop Summit Schedule at: http://hadoopsummit.org/schedule/.  Our session info is: — How Klout is changing the landscape of social media with Hadoop and BI. In this age of Big Data, data volumes grow exceedingly larger while the technical problems and…

Rate this:

Moving data to compute or compute to data? That is the Big Data question

Dorky attempts at geek Shakespere aside; as the volume, complexity, and variability of your data systems increase in … entropy …, this becomes a fundamental question in whether one scales up or scale out their data problem. Apologies for the nerdy chemistry references in advance – which starts with this picture of Dr. Arthur Grosser (more later) As noted in the previous post Scale Up or Scale Out your Data Problems? A Space Analogy, the decision to scaling up or scaling out your data problem is a key facet in your Big Data problem.  But just as important as the…

Rate this:

Scale Up or Scale Out your Data Problems? A Space Analogy

As I am writing more about Big Data, I’m been asked whether we need to have traditional relational or cube systems now that we have Big Data / NoSQL / Hadoop.  My responses are to note that these are different systems that serve different purposes even though both are used to better understand data. But before we dive into the specifics surrounding relational databases compared to Hadoop / Big Data, we need to first talk about the differences between solving the a data problem by scaling up the problem or scaling it out. One way to understand the difference is…

Rate this: