Correlation drowning and Nicholas Cage films

Interested in career in Data Sciences? Read Freakonomics first!

Over the last few years, a common question I’ve been asked is what does it take to become a data scientist?  Often my answers surrounded the technology – i.e. learn Spark, Python, and/or R; take courses in Data Sciences; play with data sets; etc.   Yet, I was never fully satisfied with that answer because I had always felt that the heart of Data Sciences (and Big Data in more generic terms) is the data – or more specifically, the ability to understand the data. Recently, I re-read “Freakonomics: A Rogue Economist Explores the Hidden Side of Everything” and it dawned…

Rate this:


Data Engineering Reading Materials: Spark, Machine Learning, and Distributed Systems Resources

Over the last few weeks, a regular question that I’ve been asked are where I can find resources about Spark, Machine Learning, and Distributed Systems.  While they seem to be disparate problems, the fact is that as a Data Engineer (or someone in Data Sciences Engineering or a Data Scientist that loves scalability and performance) you need to have your feet wet in all three disciplines to truly excel. Apache Spark Let’s start with Apache Spark (disclosure, I am with Databricks – the company was founded by the creators of Apache Spark).   I am a big fan of Apache Spark because of its…

Rate this: