Portfolio

Blog

Why does altering a Delta Lake table schema not show up in the Spark DataFrame?

Recently there was a great Delta Lake Stackoverflow question DeltaTable schema not updating when using `ALTER TABLE ADD COLUMNS` by wtfzambo. This is a great question as succinctly: What is happening here is actually as expected: To better showcase this, allow me to provide context via the file system. To recreate this exact scenario, please use the docker at https://go.delta.io/docker and use the DELTA_PACKAGE_VERSION as delta-core_2.12:2.1.0. That is, run the Docker container and use the pts-ark steps: 1. To start PySpark, run the command: 2. Run the basic commands to create a simple table 3. Run the following command to see the table structure As…

Rate this:

Deep Learning Fundamentals Series

This webinar series covers deep learning fundamentals with a focus on Keras and TensorFlow. Deep Learning has shown tremendous success, but what makes it so special? What are neural networks, and how do they work? What are the differences between popular Deep Learning frameworks like Keras or TensorFlow, and where should you start? Webinar | Presentation | Notebook | Q&A Webinar | Presentation | Notebook | Q&A Webinar | Presentation | Notebook | Q&A

Rate this:

Relaxing in Encinitas

Just north of San Diego, there is the wonderful town of Encinitas.  It is a nice relaxing surfer (or beach bum) town with excellent dining in the heart of the town itself. Juanitas Taco Shop A great little taco shop right off of North Coast Highway 101 that is not your normal tourist cafe – which is perfect!  While normally a corn tortilla fan, they have the best wheat tortillas.  This is a picture of the to-go carnitas plate where the beans are creamy, the rice is delicious yet light, the meat is flavorful, and the avocado and salsa were…

Rate this:

Introduction to Apache Spark 2.0

A Primer on Spark 2.0 Fundamentals and Architecture I’m proud to post that my O’Reilly video series Introduction to Apache Spark 2.0: A Primer on Spark 2.0 Fundamentals and Architecture is now available on O’Reilly Safari (Start your ten-day free trial) or you can purchase the video series directly. This video series highlights what’s new in Apache 2.0 and reviews its core concepts. The course starts with a high-level overview of Spark’s components and then dives into Spark 2.0’s three main themes: simplicity, speed, and intelligence. The simplicity section describes how Spark 2.0 unifies the Spark APIs and Spark session,…

Rate this:

Jump Start into Python and Apache Spark with Learning PySpark

For the last few years, I have had the opportunity to work on some of the coolest Apache Spark committers, contributors, and projects.  As luck would have it, I got the opportunity to meet my co-author Tomasz Drabas (author of the awesome Practical Data Analysis Cookbook) while we were solving some other cool Apache Spark projects.  In the process, we joined forces to share our lessons learned that will hopefully help you jump start your Python and Apache Spark projects with our book: Learning PySpark. And just to make sure, this book was reviewed by the incomparable Holden Karau, author of the…

Rate this:

On-Time Flight Performance with GraphFrames for Apache Spark

Feature Image: NASA Goddard Space Flight Center: City Lights of the United States 2012 This is an abridged version of the full blog post On-Time Flight Performance with GraphFrames. You can also reference the webinar GraphFrames: DataFrame-based graphs for Apache Spark and the On-Time Flight Performance with GraphFrames for Apache Spark notebook. An intuitive approach to understanding flight departure delays is to use graph structures. Why Graph? The reason for using graph structures is because it is a more intuitive approach to many classes of data problems: social networks, restaurant recommendations, or flight paths.  It is easier to understand these data problems…

Rate this:

Presentation: Jump Start into Apache® Spark™ and Databricks

These are the slides from the Jump Start into Apache Spark and Databricks webinar on February 10th, 2016. — Apache Spark is a fast, easy to use, and unified engine that allows you to solve many Data Sciences and Big Data (and many not-so-Big Data) scenarios easily. Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning, and graph processing. We will leverage Databricks to quickly and easily demonstrate, visualize, and debug our code samples; the notebooks will be available for you to download. You can view the on-demand webinar Jump Start into Apache® Spark™…

Rate this:

Data Exploration with Databricks

Today, it was also featured on InsideBigData: Data Exploration with Databricks.  Awesome!   This Data Exploration on Databricks jump start video will show you how go from data source to visualization in a few easy steps. Specifically, we will take semi-structured logs, easily extract and transform them, analyze and visualize the data using Spark SQL, so we can quickly understand our data. For more information and to check out other Spark notebooks, check out Selected Notebooks > Databricks Jump Start.  

Rate this: