Quick Tip: Dropping Phantom Hive Databases (e.g. CDH5 Canary test dB)

While I’m a big fan of CDH5 and Hue – sometimes I will see some funkiness that’s a tad irritating.  Specifically, there is a database with a name similar to cloudera_manager_metastore_canary_test_db_hive_hivemetastore_$guid$_2014_10_06_11_20_41 Even more irritating there is a table called cm_test_table which cannot be deleted (or renamed or even described). hive> describe cm_test_table; FAILED: SemanticException [Error 10001]:…

Yes, you can connect Tableau to SparkSQL (Spark 1.1)

As a data scientist and engineer, I appreciate that Apache Spark  has many components to make it easy to analyze, gain insight, and to generate recommendations from my data.  However, as noted within my previous presentation , one of the things missing is an easy way for analysts to visualize their data. The good news is…

Simplifying Big Data: An Introductory Hadoop Primer

Back in July, I had the honor to speak with Michael Zeller moderating the July 2014 AM webinar on Big Data. If you are interested in learning more about Big Data from a business / analyst perspective – here is our webinar on YouTube. Abstract: What’s the Big Deal with Big Data? And, more importantly,…

To Spark … and Beyond!

One of the very exciting thing about Spark is that there is the potential to have one ubiquitous tool to solve my aggregate, machine learning, graph, and other statistical / analytics problems.  And while I am proud of my time with the SQL Server team and we had achieved some amazing lofty goals (e.g. Yahoo!…

