Sometimes the Best Big Data Questions Raise The Biggest Privacy Concerns

Source: 
Author: 
Coverage Type: 

One useful definition for the unstructured data that underlies most existing and theoretical big data projects is that it was often collected for some purpose other than what the researchers are using it for.

This definition points to the potential of big data analysis as more and more information is gathered online and elsewhere, but it also points to some challenges as outlined by Duncan Watts, a principal researcher at Microsoft’s research division.

First off, a large portion of the data that might be valuable to social scientists, policymakers, urban planners and others is held by private companies that release only portions of it to researchers. Facebook, Amazon, Google, email providers and ratings companies all know certain things about you and about society, in other words, but there’s no way to aggregate that data to draw global insights.

“Many of the questions that are of interest to social science really require us being able to join these different modes of data and to see who are your friends what are they thinking and what does that mean about what you end up doing,” Watts said. “You cannot answer these questions in any but the most limited way with the data that’s currently assembled.”

Second, even if social scientists were able to draw on that aggregated data, it would raise significant privacy concerns among the public.

Finally, because much of the data that’s useful to social scientists was gathered for other purposes, there’s often some bias in the data itself, Watts said.

“When you go to Facebook, you’re not seeing some kind of unfiltered representation of what your friends are interested in,” he said. “What you’re seeing is what Facebook’s news ranking algorithm thinks that you'll find interesting. So when you click on something and the social scientist sees you do that and makes some inference about what you’re sharing and why, it’s hopelessly confounded.”


Sometimes the Best Big Data Questions Raise The Biggest Privacy Concerns