Similar to the way the smartphone changed the way we communicate – far beyond its original goal of mobile voice telephony – Apache Spark is revolutionizing Big Data. While portability may have been the catalyst of the mobile revolution, it was the ability to have one device perform multiple tasks very well with the ability to easily build and use a diverse range of applications that are the keys to its ubiquity. Ultimately, with the smartphone we have a general platform that has changed the way we communicate, socialize, work, and play. The smartphone has not only replaced older technologies but combined them in a way that led to new types of user experiences. Applying this analogy to the Big Data space – Apache Spark seeks to be the general platform that is changing the way we work and understand data.
The need for speed
No, this is not a Fast & Furious reference
The genesis for Apache Spark was to provide the flexibility and extensibility of Hadoop MapReduce at significantly faster speeds. The problem dates back to the inefficiencies of running machine learning algorithms at UC Berkeley. But this was not about time lost (due to long running executions) or inefficiencies (e.g. errors in long running jobs only revealed themselves at late stages, requiring fixing and re-running the entire job), a crucial component is that long running queries interrupt your train of thought.
To continue reading, please the InsideBigData Article: Apache Spark is the Smartphone of Big Data.