Presentation: Concur Discovers the True Value of Data

Concur, the leading provider of spend management solutions and services, will be joining us to discuss how they implemented Cloudera for data discovery and analytics. Using an enterprise data hub, Concur was able to provide their data scientists a centralized environment that allowed for faster and smarter analytic development.  

Rate this:

Feb Spark Events: Data Discovery, Dato & Spark, and Spark Camp

An awesomely busy February coming up for those whom are interested in all things Apache Spark! Concur Discovers the True Value of Data A joint Cloudera and Concur webinar on February 2nd, 2015 where we will discuss the benefits of utilizing CDH5 within Concur’s modern Big Data architecture (including Spark of course!) Better Together: Dato + GraphLab We’ve got a great Seattle Spark Meetup event feature three speakers highlighting the integration between Dato’s GraphLab Create and Apache Spark.   Come join us at Concur’s new Bellevue Training Center to learn more about GraphLab Create, Apache Spark, and network. Strata +…

Rate this:

Quick Tip: Dropping Phantom Hive Databases (e.g. CDH5 Canary test dB)

While I’m a big fan of CDH5 and Hue – sometimes I will see some funkiness that’s a tad irritating.  Specifically, there is a database with a name similar to cloudera_manager_metastore_canary_test_db_hive_hivemetastore_$guid$_2014_10_06_11_20_41 Even more irritating there is a table called cm_test_table which cannot be deleted (or renamed or even described). hive> describe cm_test_table; FAILED: SemanticException [Error 10001]: Table not found cm_test_table hive> alter table cm_test_table RENAME to cm_test_table2; FAILED: SemanticException [Error 10001]: Table not found cm_test_table hive> drop table cm_test_table; FAILED: SemanticException [Error 10001]: Table not found cm_test_table To work around this problem, its a matter of using the CASCADE reference to…

Rate this:

Spark atop Mesos on Google Cloud Platform querying Google Cloud Storage

A great reason to jump into Spark on Mesos on Google Cloud Platform is because you can quickly spin up a development environment to work with Spark, Mesos, Google Cloud, and Marathon together very quickly. A great way to set this up is to follow the steps in Paco Nathan’s (@pacoid) great blog post Spark atop Mesos on Google Cloud Platform. But what’s missing from this configuration is the ability to connect to Google Cloud Storage (GCS) so you can run your Spark queries off of a persistent elastic storage. As noted in the diagram below, you will first install Spark…

Rate this:

Yes, you can connect Tableau to SparkSQL (Spark 1.1)

As a data scientist and engineer, I appreciate that Apache Spark  has many components to make it easy to analyze, gain insight, and to generate recommendations from my data.  However, as noted within my previous presentation , one of the things missing is an easy way for analysts to visualize their data. The good news is there is an easy way to gain visuals of your data by connecting Tableau to SparkSQL!  As noted in my Tableau Data14 presentation (slides are embedded below), there is an unofficial method to connect Tableau to SparkSQL. For more information, please read on at An Absolutely…

Rate this:

The Future of Hadoop: A deeper look at Apache Spark

Understand why Apache Spark has experienced such wide adoption and learn about some Spark use cases today. There is also a technical deep dive into the architecture, and our vision for the Hadoop ecosystem and why we believe Spark is the successor to MapReduce for Hadoop data processing. As well, here’s the link to The Future of Hadoop: A deeper look at Apache Spark webinar.

Rate this:

Simplifying Big Data: An Introductory Hadoop Primer

Back in July, I had the honor to speak with Michael Zeller moderating the July 2014 AM webinar on Big Data. If you are interested in learning more about Big Data from a business / analyst perspective – here is our webinar on YouTube. Abstract: What’s the Big Deal with Big Data? And, more importantly, what is the business case for Big Data? In this session, we will focus on the fundamentals of Hadoop as it is the foundation for Big Data. We’ll talk the technology but also the business cases on what you can do and not do with…

Rate this: