Apache Spark is the Smartphone of Big Data

Similar to the way the smartphone changed the way we communicate – far beyond its original goal of mobile voice telephony – Apache Spark is revolutionizing Big Data. While portability may have been the catalyst of the mobile revolution, it was the ability to have one device perform multiple tasks very well with the ability to easily build and use a diverse range of applications that are the keys to its ubiquity. Ultimately, with the smartphone we have a general platform that has changed the way we communicate, socialize, work, and play. The smartphone has not only replaced older technologies but combined them in a way that led to new types of user experiences. Applying this analogy to the Big Data space – Apache Spark seeks to be the general platform that is changing the way we work and understand data.

The need for speed

No, this is not a Fast & Furious reference

The genesis for Apache Spark was to provide the flexibility and extensibility of Hadoop MapReduce at significantly faster speeds. The problem dates back to the inefficiencies of running machine learning algorithms at UC Berkeley. But this was not about time lost (due to long running executions) or inefficiencies (e.g. errors in long running jobs only revealed themselves at late stages, requiring fixing and re-running the entire job), a crucial component is that long running queries interrupt your train of thought.

To continue reading, please the InsideBigData Article: Apache Spark is the Smartphone of Big Data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s