Introduction to Apache Spark 2.0

A Primer on Spark 2.0 Fundamentals and Architecture I’m proud to post that my O’Reilly video series Introduction to Apache Spark 2.0: A Primer on Spark 2.0 Fundamentals and Architecture is now available on O’Reilly Safari (Start your ten-day free trial) or you can purchase the video series directly. This video series highlights what’s new in Apache 2.0 and reviews its core concepts. The course starts with a high-level overview of Spark’s components and then dives into Spark 2.0’s three main themes: simplicity, speed, and intelligence. The simplicity section describes how Spark 2.0 unifies the Spark APIs and Spark session,…

Rate this:

Jump Start into Python and Apache Spark with Learning PySpark

For the last few years, I have had the opportunity to work on some of the coolest Apache Spark committers, contributors, and projects.  As luck would have it, I got the opportunity to meet my co-author Tomasz Drabas (author of the awesome Practical Data Analysis Cookbook) while we were solving some other cool Apache Spark projects.  In the process, we joined forces to share our lessons learned that will hopefully help you jump start your Python and Apache Spark projects with our book: Learning PySpark. And just to make sure, this book was reviewed by the incomparable Holden Karau, author of the…

Rate this:

Apache Spark is the Smartphone of Big Data

Similar to the way the smartphone changed the way we communicate – far beyond its original goal of mobile voice telephony – Apache Spark is revolutionizing Big Data. While portability may have been the catalyst of the mobile revolution, it was the ability to have one device perform multiple tasks very well with the ability to easily build and use a diverse range of applications that are the keys to its ubiquity. Ultimately, with the smartphone we have a general platform that has changed the way we communicate, socialize, work, and play. The smartphone has not only replaced older technologies…

Rate this:

Notebook Gallery

Here are some of the notebooks created to showcase various Apache Spark use cases. These are all using Databricks Community Edition which you can get at Try Databricks. You can also access the source from : https://github.com/dennyglee/databricks. JSON Support GLM in SparkR Window Functions  Random Forests DataFrame API ML Operations   Decision Trees Statistical Functions  Data Import  Data Exploration Quick Start Python Quick Start Scala  Ad-Tech Example Flight Delays  Genomics Mobile Sample   Pop vs. Price LR  Pop vs. Price DF  Salesforce Leads Spark 1.6 (Multiple)   Spark 1.6  

Rate this:

Presentation: Concur Discovers the True Value of Data

Concur, the leading provider of spend management solutions and services, will be joining us to discuss how they implemented Cloudera for data discovery and analytics. Using an enterprise data hub, Concur was able to provide their data scientists a centralized environment that allowed for faster and smarter analytic development.  

Rate this: