A Primer on Spark 2.0 Fundamentals and Architecture I’m proud to post that my O’Reilly video series Introduction to Apache Spark 2.0: A Primer on Spark 2.0 Fundamentals and Architecture is now available on O’Reilly Safari (Start your ten-day free trial) or you can purchase the video series directly. This video series highlights what’s new in Apache 2.0 and reviews its core concepts. The course starts with a high-level overview of Spark’s components and then dives into Spark 2.0’s three main themes: simplicity, speed, and intelligence. The simplicity section describes how Spark 2.0 unifies the Spark APIs and Spark session,…
Tag: Machine Learning
Data Engineering Reading Materials: Spark, Machine Learning, and Distributed Systems Resources
Over the last few weeks, a regular question that I’ve been asked are where I can find resources about Spark, Machine Learning, and Distributed Systems. While they seem to be disparate problems, the fact is that as a Data Engineer (or someone in Data Sciences Engineering or a Data Scientist that loves scalability and performance) you need to have your feet wet in all three disciplines to truly excel. Apache Spark Let’s start with Apache Spark (disclosure, I am with Databricks – the company was founded by the creators of Apache Spark). I am a big fan of Apache Spark because of its…
Simplify Machine Learning on Spark with Databricks
As many data scientists and engineers can attest, the majority of the time is spent not on the models themselves but on the supporting infrastructure. Key issues include on the ability to easily visualize, share, deploy, and schedule jobs. More disconcerting is the need for data engineers to re-implement the models developed by data scientists for production. With Databricks, data scientists and engineers can simplify these logistical issues and spend more of their time focusing on their data problems. Simplify Visualization An important perspective for data scientists and engineers is the ability to quickly visualize the data and the model…
Seattle Spark Meetup Roundup: Summit, xPatterns, and Machine Learning – next is Interactive OLAP!
We’ve had some really exciting Spark sessions at the Seattle Spark Meetup even with all of the great stuff announced during last week’s Spark Summit 2014. This post is a couple months past due, so here’s the latest compiled together! xPatterns on Spark, Shark, Mesos, & Tachyon Claudiu Barbura showcased Atigeo’s xPatterns – a real world customer architecture utilizing Spark, Shark, Mesos, and Tachyon! A lot of great demos along with lessons learned and tips & tricks! xPatterns on Spark, Shark, Mesos, and Tachyon Session xPatterns on Spark, Shark, Mesos, and Tachyon Slides Fun Things You Can Do With…