The linked article is for SIAM News, the magazine for members of the Society for Industrial and Applied Mathematics (SIAM). The audience for this magazine, in other words, is professional mathematicians and related researchers working in a wide variety of fields. While the article contains equations, I wrote it to be understandable even if you skip over the math.
[ This blog is dedicated to tracking my most recent publications. Subscribe to the feed to keep up with all the science stories I write! ]
Using Differential Privacy to Protect the United States Census
In 2006, Netflix hosted a competition to improve its algorithm for providing movie recommendations to customers based on their past choices. The DVD rental and video streaming service shared anonymized rental records from real subscribers, assuming that their efforts to remove identifying information sufficiently protected user identities. This assumption was wrong; external researchers quickly proved that they could pinpoint personal details by correlating other public data with the Netflix database, potentially exposing private information.
This fatal flaw in the Netflix Prize challenge highlights multiple issues concerning privacy in the information age, including the simultaneous need to perform statistical analyses while protecting the identities of people in the dataset. Merely hiding personal data is not enough, so many statisticians are turning to differential privacy. This method allows researchers to extract useful aggregate information from data while preserving the privacy of individuals within the sample.
“Even though researchers are just trying to learn facts about the world, their analyses might incidentally reveal sensitive information about particular people in their datasets,” Aaron Roth, a statistician at the University of Pennsylvania, said. “Differential privacy is a mathematical constraint you impose on an algorithm for performing
data analysis that provides a formal guarantee of privacy.”