Protecting privacy with mathematics

The linked article is for SIAM News, the magazine for members of the Society for Industrial and Applied Mathematics (SIAM). The audience for this magazine, in other words, is professional mathematicians and related researchers working in a wide variety of fields. While the article contains equations, I wrote it to be understandable even if you skip over the math.

[ This blog is dedicated to tracking my most recent publications. Subscribe to the feed to keep up with all the science stories I write! ]

Using Differential Privacy to Protect the United States Census

Census data must simultaneously be publicly available and protect the privacy of the people it describes. Differential privacy is a method that injects noise into the data to hide the presence of individual responses, while preserving the general statistical structure of the data. [Credit: moi, which is why I’m not a professional graphic artist]

For SIAM News:

In 2006, Netflix hosted a competition to improve its algorithm for providing movie recommendations to customers based on their past choices. The DVD rental and video streaming service shared anonymized rental records from real subscribers, assuming that their efforts to remove identifying information sufficiently protected user identities. This assumption was wrong; external researchers quickly proved that they could pinpoint personal details by correlating other public data with the Netflix database, potentially exposing private information.

This fatal flaw in the Netflix Prize challenge highlights multiple issues concerning privacy in the information age, including the simultaneous need to perform statistical analyses while protecting the identities of people in the dataset. Merely hiding personal data is not enough, so many statisticians are turning to differential privacy. This method allows researchers to extract useful aggregate information from data while preserving the privacy of individuals within the sample.

“Even though researchers are just trying to learn facts about the world, their analyses might incidentally reveal sensitive information about particular people in their datasets,” Aaron Roth, a statistician at the University of Pennsylvania, said. “Differential privacy is a mathematical constraint you impose on an algorithm for performing
data analysis that provides a formal guarantee of privacy.”

[read the rest at SIAM News…]