As some of you may have noticed in some of my more recent blog postings, I have started writing about my interest in the field of privacy. With all that is going on these days with tonnes of information floating on the web and the ability to join disparate data sources together to reveal interesting patterns (or individuals) – it will become imperative that we find ways to guarantee privacy of the individual if we are ever to find interesting patterns within our data.
The example that I had used with my most recent blog on the subject of privacy and Channel 8 interview was that of data mining in medical research. In this specific example, I had noted that I had performed data mining (using Analysis Services of course) against phenotypic (i.e. not genetic) information of Asthmatic patients. An interesting tid bit I like quoting from this research is that if an asthmatic patient misses their regular scheduled appointments, s/he may have 2x times greater chance of suffering from an asthmatic adverse event (death, hospitalization, ER visit). Now, to give context, the number of adverse events were quite small to begin with and the subsequent analysis can explain that this most likely has to do with the fact that people who didn’t skip their visits were typically more proactive to their health instead of reactive (i.e. would not wait until they were really sick before getting a checkup or visit to prevent their asthma from getting worse).
One of the issues that I had while I was doing this analysis was that of the issues revolving around privacy. I (rightly so) had to go through a lot of hoops in order to make sure that I had the data within a safe environment and the analysis I was doing would not reveal who an individual was from the medical records. Saying this, the Analyzing Data while Protecting Privacy – A Case Study blog is one of the various postings that I have written (and will continue to write) on the subject.
But what it comes down to is these theme: It will not be possible for us to perform interesting analytics (e.g. data mining, machine learning, etc.) to help patients (e.g. build a concept of prognostics) unless we can guarantee the privacy of the patient.
Saying this, what does this have to do CMU/MSR Mindswap on Privacy as per the title of this blog? Well, if you refer to the linked blogs, you’ll notice that the concepts of differential privacy (also termed in the blog as privacy preserving data analysis) are based on the research from MSR (Microsoft Research). As well, you may notice that I reference some examples of privacy from Latanya Sweeney whom is from CMU (Carnagie Mellon University). Last week, the Center for Computational Thinking at CMU had setup a mindswap event which combines some of the top researchers from both institutions to get together and swap information on the advancing field of research into privacy. You can find out more information by referring to the link with powerpoint slide references at: CMU/MSR Mindswap on Privacy.