Over the last few years, a common question I’ve been asked is what does it take to become a data scientist? Often my answers surrounded the technology – i.e. learn Spark, Python, and/or R; take courses in Data Sciences; play with data sets; etc. Yet, I was never fully satisfied with that answer because I had always felt that the heart of Data Sciences (and Big Data in more generic terms) is the data – or more specifically, the ability to understand the data. Recently, I re-read “Freakonomics: A Rogue Economist Explores the Hidden Side of Everything” and it dawned…
Tag: Data Engineering
Data Engineering Reading Materials: Spark, Machine Learning, and Distributed Systems Resources
Over the last few weeks, a regular question that I’ve been asked are where I can find resources about Spark, Machine Learning, and Distributed Systems. While they seem to be disparate problems, the fact is that as a Data Engineer (or someone in Data Sciences Engineering or a Data Scientist that loves scalability and performance) you need to have your feet wet in all three disciplines to truly excel. Apache Spark Let’s start with Apache Spark (disclosure, I am with Databricks – the company was founded by the creators of Apache Spark). I am a big fan of Apache Spark because of its…