Feb Spark Events: Data Discovery, Dato & Spark, and Spark Camp

An awesomely busy February coming up for those whom are interested in all things Apache Spark! Concur Discovers the True Value of Data A joint Cloudera and Concur webinar on February 2nd, 2015 where we will discuss the benefits of utilizing CDH5 within Concur’s modern Big Data architecture (including Spark of course!) Better Together: Dato + GraphLab We’ve got a great Seattle Spark Meetup event feature three speakers highlighting the integration between Dato’s GraphLab Create and Apache Spark.   Come join us at Concur’s new Bellevue Training Center to learn more about GraphLab Create, Apache Spark, and network. Strata +…

Rate this:

Build your own CDH5 QuickStart VM with Spark on CentOS

A great way to jump into CDH5 and Spark (with the latest version of Hue) is to build your own CDH5 setup on a VM.  As of this writing, a CDH5 QuickStart VM is not available (though you can download the Cloudera QuickStart VM for CDH4.5). Below are the steps to build your own CDH5 / Spark setup on CentOS 6.5.  Note, the installation of CDH5 through Cloudera Manager is actually quite straight forward.  Instead, these instructions focus on the steps prior to installing Cloudera Manager 5 (and the express install of CDH5) to minimize the hiccups you may run…

Rate this:

Quick Tech Tip: SETting Cloudera Hue Beeswax to create a compressed Hive table

I’m currently playing with CDH 4.1 and was having fun with Hue – specifically Beeswax to execute Hive queries from a nice web UI. As noted in Hadoop compression codecs and optimizing Hive joins (and using compression to do it), using compression gives you more space and in many cases can improve query performance.  Yet to my dismay, when I tried to execute a bunch of SET statements, I ended up getting  the OK FAILED parse exception. Of course this is what happens when you haven’t played a particular tech in awhile and don’t bother to do tutorials!  On the…

Rate this: