Before Big Data mining, we need to ensure privacy–but how?



First things first, but not necessarily in that order!

— words of wisdom from Doctor Who





The tweet below got me thinking about the importance of privacy.

Data mining demands sound privacy policies in age of ‘big data’ #datamining #BI #bigdata


After all, in this day of age, with access to all sorts of disparate information, it becomes easier and easier to uniquely identify a person by their behaviors and external patterns.   In Dr. Latanya Sweeney’s ground breaking paper k-Anonymity: a model for protecting privacy, she had noted the following startling observations:

  • Based on the 1990 census, over the 80% of the US population was personally identifiable based on the three attributes of 5-digit zip code, birth date, and gender
  • By combining the state of Massachusetts voter’s list with PII healthcare records, she was able to identify the medical records of then Governor William Weld


With the power of Big Data, it is easy to forget that the more we dig and the mine, the more we are potentially invading privacy.   So now more than ever, we need to make use of privacy mechanisms such as k-anonymity or privacy preserving histogram such as episilon noise via Analyzing Data while Protecting Privacy – A Case Study

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s