Over the last few weeks, a regular question that I’ve been asked are where I can find resources about Spark, Machine Learning, and Distributed Systems. While they seem to be disparate problems, the fact is that as a Data Engineer (or someone in Data Sciences Engineering or a Data Scientist that loves scalability and performance) you need to have your feet wet in all three disciplines to truly excel. Apache Spark Let’s start with Apache Spark (disclosure, I am with Databricks – the company was founded by the creators of Apache Spark). I am a big fan of Apache Spark because of its…