Yes, you can connect Tableau to SparkSQL (Spark 1.1)

As a data scientist and engineer, I appreciate that Apache Spark  has many components to make it easy to analyze, gain insight, and to generate recommendations from my data.  However, as noted within my previous presentation , one of the things missing is an easy way for analysts to visualize their data. The good news is there is an easy way to gain visuals of your data by connecting Tableau to SparkSQL!  As noted in my Tableau Data14 presentation (slides are embedded below), there is an unofficial method to connect Tableau to SparkSQL. For more information, please read on at An Absolutely…

Rate this:

The Future of Hadoop: A deeper look at Apache Spark

Understand why Apache Spark has experienced such wide adoption and learn about some Spark use cases today. There is also a technical deep dive into the architecture, and our vision for the Hadoop ecosystem and why we believe Spark is the successor to MapReduce for Hadoop data processing. As well, here’s the link to The Future of Hadoop: A deeper look at Apache Spark webinar.

Rate this:

Seattle Spark Meetup Roundup: Summit, xPatterns, and Machine Learning – next is Interactive OLAP!

We’ve had some really exciting Spark sessions at the Seattle Spark Meetup even with all of the great stuff announced during last week’s Spark Summit 2014.  This post is a couple months past due, so here’s the latest compiled together! xPatterns on Spark, Shark, Mesos, & Tachyon Claudiu Barbura showcased Atigeo’s xPatterns – a real world customer architecture utilizing Spark, Shark, Mesos, and Tachyon!  A lot of great demos along with lessons learned and tips & tricks! xPatterns on Spark, Shark, Mesos, and Tachyon Session xPatterns on Spark, Shark, Mesos, and Tachyon Slides   Fun Things You Can Do With…

Rate this:

Build your own CDH5 QuickStart VM with Spark on CentOS

A great way to jump into CDH5 and Spark (with the latest version of Hue) is to build your own CDH5 setup on a VM.  As of this writing, a CDH5 QuickStart VM is not available (though you can download the Cloudera QuickStart VM for CDH4.5). Below are the steps to build your own CDH5 / Spark setup on CentOS 6.5.  Note, the installation of CDH5 through Cloudera Manager is actually quite straight forward.  Instead, these instructions focus on the steps prior to installing Cloudera Manager 5 (and the express install of CDH5) to minimize the hiccups you may run…

Rate this:

Why all this interest in Spark?

“Spark … is what you might call a Swiss Army knife of Big Data analytics tools” — Reynold Xin (@rxin), Berkeley AmpLab Shark Development Lead The above quote – from the Wired article “Spark: Open Source Superstar Rewrites Future of Big Data” – encompasses why I am a fan of Spark.  If you are an avid hiker or outdoors-person, you already appreciate the flexibility of a Swiss Army Knife (or Leatherman).  It is the perfect compact tool to do a variety of simple but necessary tasks – bordering on life saving (below is a picture from my ascent to Mount…

Rate this:

Jump Start onto Spark 0.7.2 and Scala 2.9.3 on Mac OSX

Spark is an in-memory open source cluster computing system allowing for fast iterative and interactive analytics.  Spark utilizes Scala – a type-safe objected oriented language with functional properties that is fully interoperable with Java.  For more information about Spark, please refer to http://spark-project.org.  To test out Spark, you can install the stand-alone version on Mac OSX. This is a follow up to my previous blog post on the topic – Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8).  Since this blog post, Spark has added some interesting features including: – Spark Streaming as part of Spark 0.7 – An associated…

Rate this:

Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8)

Spark is an in-memory open source cluster computing system allowing for fast iterative and interactive analytics.  Spark utilizes Scala – a type-safe objected oriented language with functional properties that is fully interoperable with Java.  For more information about Spark, please refer to http://spark-project.org.  To test out Spark, you can install the stand-alone version on Mac OSX. Install Scala 2.9.2 The first thing you will need to do is to install Scala 2.9.2 as Spark 0.6.1 is dependent on it.  As of this posting, the current version of Scala is 2.10 but there are some issues with Spark 0.6.1 and Scala…

Rate this: