Spark atop Mesos on Google Cloud Platform querying Google Cloud Storage

A great reason to jump into Spark on Mesos on Google Cloud Platform is because you can quickly spin up a development environment to work with Spark, Mesos, Google Cloud, and Marathon together very quickly. A great way to set this up is to follow the steps in Paco Nathan’s (@pacoid) great blog post Spark atop Mesos on Google Cloud Platform.

But what’s missing from this configuration is the ability to connect to Google Cloud Storage (GCS) so you can run your Spark queries off of a persistent elastic storage. As noted in the diagram below, you will first install Spark onto the development Mesos cluster which contains a master node with three slave nodes.  By installing the GCS connector, Spark can now communicate with GCS.

Spark Mesos Google Cloud diagram

For more information, continue reading at Spark atop Mesos on Google Cloud Platform querying Google Cloud Storage.

3 Comments

  1. Beg your pardon, but what does this have to do with dim sum? 🙂

    1. Doing this allows me to save money – that I need for dim sum 🙂

  2. Was researching dim sum in SF recently, thought about you and that great place you took me to in Seattle for dim sum. Hope things are going well for you and yours. I trust that life away from the mothership has been ok?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s