Over One Billion Consumers Exposed in Data Leak

Written by

Personal information on over one billion individuals harvested by two data enrichment firms has been exposed online, according to security researchers.

Data enrichment or aggregation providers effectively sell access to large stores of data merged from multiple third-party sources, primarily for companies to gain deeper insights into current and prospective customers.

However, there are inevitable privacy risks attached to such practices, despite the efforts of the aggregator firms themselves to keep their own data stores secure.

In mid-October, Bob Diachenko and Vinny Troia discovered a wide open Elasticsearch server containing four billion user accounts across more than 4TB of data.

“A total count of unique people across all data sets reached more than 1.2 billion people, making this one of the largest data leaks from a single source organization in history. The leaked data contained names, email addresses, phone numbers, LinkedIn and Facebook profile information,” explained Vinny Troia, chief of threat intelligence at Data Viper.

“The discovered Elasticsearch server containing all of the information was unprotected and accessible via web browser at http://35.199.58.125:9200. No password or authentication of any kind was needed to access or download all of the data.”

The privacy snafu exposed around 622 million unique email addresses, mainly those associated with a data enrichment firm known as People Data Labs [PDL]. The second was identified by Troia as OxyData and is an almost complete scrape of LinkedIn data.

However, it’s unclear who left the data exposed on the Elasticsearch server.

Troy Hunt, who runs the HaveIBeenPwned? breach notification site, said the case highlights a real challenge at the heart of the data enrichment industry.

“Regardless of how well these data enrichment companies secure their own system, once they pass the data downstream to customers it's completely out of their control. My data — almost certainly your data too — is replicated, mishandled and exposed and there's absolutely nothing we can do about it. Well, almost nothing,” he said.

“[PDL’s] privacy policy states that people may ‘access any information we have on them’ and that they will ‘reply to a person’s request within five business days’ or delete it outright. It'll be interesting to see how that scales if even a very small slice of the 622M impacted individuals takes them up on that offer.”

What’s hot on Infosecurity Magazine?