Big Data Poses Many Data Protection Questions

Big data has been a much-used buzz-phrase for several years. However, it is only more recently that big data analytics has entered the corporate mainstream. More and more companies now say that they are using or looking to use big data analytics in their business. But the concept of big data raises a number of issues for data protection and data security, and while there has been no major breach of big data datasets yet, it is only a matter of time.

There is no single, generally accepted definition of big data, but one of the most common is that given by Gartner: "high volume, high velocity and high variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making." Increases in processing power and declining storage costs mean that we are creating more data than ever before, with IBM estimating in September 2013 that 90% of the world's data was created in the previous two years, and Google processing 24 petabytes of data every single day. Businesses and governments are looking at ways to harness this volume of data, and the ability to process unstructured data, to find correlations that they would otherwise have been unable to detect.

However, the very nature of big data does mean that careful consideration needs to be given to how data is handled. Big data often means combining datasets to create as large a dataset as possible, and holding and analyzing as much data as possible. The term ‘N=all’ is often used, meaning the dataset to use is ‘all available data’. This goes against traditional data protection principles, which hold data minimization as one of the key requirements. It also means that, even where data is theoretically anonymized, big datasets will be a target for cyber-criminals simply on the basis that the aggregation of data in one place makes it a target. On the subject of anonymization, the jury is still out but it is clear from studies already published that big data does afford some ability to re-identify individuals from theoretically anonymized datasets. At the moment, it looks as though this is, in fact, an issue arising from poor anonymization techniques.

"The very nature of big data [means] that careful consideration needs to be given to how data is handled"

Another issue with big data processing is that it is often unclear what use data will be put to in the future, so big data seeks to retain datasets for as long as possible. Big data analytics is about finding unexpected correlations and taking advantage of them; while to some extent one can predict what datasets may be used in future, this ability is limited in scope with big data. This means that some of the protection afforded by the requirements to (a) only use data for the purpose for which it was collected and to (b) hold it for the minimum time to achieve that purpose may be lost with big data processing.

Allied to this is the risk that organizations take when transferring datasets to third parties for analysis. As organizations do not necessarily hold or have access to all of the data that may be necessary for an analytics exercise, they may transfer their own data to a third party to combine with its data for analytics. Alternatively, an organization may transfer data internally to a business analyst team. In each case, the organization needs to ensure that it is not moving data outside its secure perimeters and therefore putting data at risk. This is particularly the case where businesses are moving to cloud storage solutions and some big data analytics tools are specifically designed to pull data from multiple sources in a network for analysis. Organizations need to be particularly alert to the risk of rendering their carefully designed security protections irrelevant by inadvertently moving data outside those perimeters.

It is not, however, all bad news. Big data is also potentially a powerful tool in detecting security breaches. A Verizon paper estimated that information relating to around 80% of breaches was available in logs, but was not identified and acted upon. A major recent example of this was the Target security breach, where tools detected the intrusion some time before it was identified, but the information security team didn't spot it.

Big data analytics can assist organizations in sorting through false positives and analyzing the mass of data produced by security tools. Also helpful are larger scale projects such as SOLTRA and the FCAS, where financial institutions, critical infrastructure and regulatory bodies are sharing data and information to try to better protect against and respond to cyber-threats. The issue remains that, while big data analytics should provide security teams with a tool to identify intrusions more effectively, the human element is still the weakest link. While big data presents new risks, and new tools to combat them, organizations will still need to ensure that they have adequate systems in place to ensure compliance with legal and regulatory requirements, and that they continually review their systems to keep up to date with latest developments.

About the Author

Paul Glass is a senior associate in the Disputes and Investigations Group at Taylor Wessing. Paul's practice includes advising on a range of general commercial litigation and arbitration (under LCIA, ICC and AAA rules), and advising in specialist areas such as financial and IT disputes, as well as cyber security and data protection. Paul graduated from Oxford University with a BA in jurisprudence

Big Data Poses Many Data Protection Questions

Paul Glass

About the Author

You may also like

Which Issues Can DataOps Solve?

Is Hype Around AI Muddling the Message for IT Decision Makers?

Too Many Incident Responders are Chasing Fires that do not Exist

Fighting Fire with Fire: Why Proactive Security is so Vital for Businesses

Fighting Cyber Threats with an Open Data Model

What’s hot on Infosecurity Magazine?

Most IT Leaders Say Severity of Cyber-Attacks has Increased

Chinese Espionage Group Upgrades Malware Arsenal to Target All Major OS

Russia Shifts Cyber Focus to Battlefield Intelligence in Ukraine

Exclusive: Paris 2024 CISO Reveals Cybersecurity Plans for the Olympics

Prolific DDoS Marketplace Shut Down by UK Law Enforcement

Cybercriminals Exploit CrowdStrike Outage Chaos

Fact vs. Fiction: Dispelling Zero Trust Misconceptions

Cybercriminals Exploit CrowdStrike Outage Chaos

Exclusive: Paris 2024 CISO Reveals Cybersecurity Plans for the Olympics

CISA's Jack Cable Discusses US Push for More Secure Software

Chinese Espionage Group Upgrades Malware Arsenal to Target All Major OS

North Korean Hackers Targeted Cybersecurity Firm KnowBe4 with Fake IT Worker

The Future of Fraud: Defending Against Advanced Account Attacks

Mastering IP & Data Security in the Industrial Age

Experiencing a DDoS Simulation to Enhance Defenses

How to Unlock Frictionless Security with Device Identity & MFA

Adapting to Tomorrow's Threat Landscape: AI's Role in Cybersecurity and Security Operations in 2024

How to Proactively Remediate Rising Web Application Threats

#Infosec2024: Claire Williams on Leadership, Cultivating a High Performing Team and Overcoming Adversity (video)

#Infosec2024: Navigating the Ransomware Toll on Victims with Jason Nurse (video)

#Infosec2024: Experts Share How CISOs Can Manage Change as the Only Constant

#Infosec2024: 104 EU Laws Have Different Definitions of Cybersecurity

Infosecurity Magazine Autumn Online Summit 2024: Day Two

Infosecurity Magazine Autumn Online Summit 2024: Day One

Big Data Poses Many Data Protection Questions

Written by

Paul Glass

About the Author

You may also like

What’s hot on Infosecurity Magazine?