Share

Related Links

Top 5 Stories

News

Netflix' second data challenge on revealing customers DVD rental habits has privacy experts hopping mad

29 September 2009

Privacy advocates are furious at plans by DVD rental service Netflix to unveil more data about the rental habits of its customers. Experts argue that the data could easily be used to identify customers and draw inferences about their lifestyles.

Last week, Netflix announced that it had awarded the prize for its first US$1 million Netflix challenge. The company had made rental data available to the public, asking for algorithms that would help Netflix make its movie recommendation system more accurate. The original challenge focused on improving the recommendation system for those rental customers who had already rated large numbers of films using the Netflix website.

The company simultaneously announced a second challenge, with the same prize, this time focusing on improving the recommendation system for those customers who don't rate movies often, or at all. To do this, it said that it would take advantage of demographic and behavioural data "carrying implicit signals about the individuals' taste profiles".

The new data set includes information about customer age, gender, zip code, genre ratings, and previously chosen movies. "As with the first Netflix prize, all data provided is anonymous and cannot be associated with a specific Netflix member", it said.

However, experts argue that the ability to identify customers using the anonymous data provided by Netflix has already been proven. Paul Ohm, associate professor of law and telecommunications at the University of Colorado law school, argued in a paper published this August that Netflix' attempt to anonymize the data in its first challenge was fatally flawed. Researchers from the University of Texas, Arvind Narayanan and Professor Vitaly Shmatikov, found that it was easy to identify individuals within the data set with a high degree of probability with just a little outside knowledge about their movie watching preferences, he warned.

Ohm praised Netflix for at least trying to consult with experts when releasing the data for its first challenge, but expressed concern over the second one. "Netflix should cancel this new, irresponsible contest", he warned. "Researchers have known for more than a decade that gender plus zip code plus birthdate uniquely identifies a significant percentage of Americans."

Although being sent Ohm's comments, Netflix staff stuck to the company line. "The information we’re giving in the Netflix Prize 2 dataset is completely anonymous. It contains no personally identifiable information. It does not contain anyone’s name, address, or any means to connect a particular record with a specific Netflix member", said the company in a statement to Infosecurity magazine.

"As in Netflix Prize 1, the dataset contains some movie ratings from select anonymous members. It also includes some queue adds and taste preferences, broad age ranges, gender and zip codes but, again, completely anonymous. All that data is modified – our scientists call it perturbed – to make it anonymous."

This article is featured in:
Compliance and Policy  • Internet and Network Security

 

Comment on this article

You must be registered and logged in to leave a comment about this article.