Correlation drowning and Nicholas Cage films

Interested in career in Data Sciences? Read Freakonomics first!

Over the last few years, a common question I’ve been asked is what does it take to become a data scientist?  Often my answers surrounded the technology – i.e. learn Spark, Python, and/or R; take courses in Data Sciences; play with data sets; etc.   Yet, I was never fully satisfied with that answer because I had always felt that the heart of Data Sciences (and Big Data in more generic terms) is the data – or more specifically, the ability to understand the data. Recently, I re-read “Freakonomics: A Rogue Economist Explores the Hidden Side of Everything” and it dawned…

Rate this:


Data Engineering Reading Materials: Spark, Machine Learning, and Distributed Systems Resources

Over the last few weeks, a regular question that I’ve been asked are where I can find resources about Spark, Machine Learning, and Distributed Systems.  While they seem to be disparate problems, the fact is that as a Data Engineer (or someone in Data Sciences Engineering or a Data Scientist that loves scalability and performance) you need to have your feet wet in all three disciplines to truly excel. Apache Spark Let’s start with Apache Spark (disclosure, I am with Databricks – the company was founded by the creators of Apache Spark).   I am a big fan of Apache Spark because of its…

Rate this:

Presentation: Concur Discovers the True Value of Data

Concur, the leading provider of spend management solutions and services, will be joining us to discuss how they implemented Cloudera for data discovery and analytics. Using an enterprise data hub, Concur was able to provide their data scientists a centralized environment that allowed for faster and smarter analytic development.  

Rate this: