Microsoft researchers say anonymized data isn't so anonymous

Source: 
Author: 
Coverage Type: 

Data routinely gathered in Web logs -- IP address, cookie ID, operating system, browser type, user-agent strings -- can threaten online privacy because they can be used to identify the activity of individual machines, Microsoft researchers say.

At the same time, analysis of such data when anonymized can help detect malicious activity and so improve overall Internet security, they add. The researchers found that 62 percent of the time, HTTP user-agent information alone can accurately tag a host. Combine that same information with the IP address, and the accuracy jumps to 80.6 percent. If the user-agent information is combined with just the IP prefix the accuracy is still 79.3 percent, they say. The highest accuracy came when more than one user ID was linked to a single host, as would be the case in a family that shares a single computer. In such cases, multiple IDs would accurately represent that one host computer. The accuracy rate was 92.8 percent. The analysis of this seemingly benign information was based on a month - August 2010 - of anonymized Hotmail and Bing data on hundreds of millions of users. The researchers say they tried to find out whether a single piece of log data can uniquely reveal a particular host.


Microsoft researchers say anonymized data isn't so anonymous