Big Data, Big Cloud, Big Problem

By Todd Thiemann

Big Data presents a big opportunity for businesses to mine large volumes of data from a variety of sources to make better and more high-velocity decisions. Since big data implementations are practically always deployed in a cloud environment, be it a private cloud or public cloud, this poses a major security challenge. That’s because some of that “Big Data” will inevitably be sensitive in the form of intellectual property covered by corporate security mandates, cardholder data affected by PCI DSS, or Personally Identifiable Information (PII) affected by state or national data breach laws.

For the purposes of this post, our definition of Big Data refers to the non-relational storage and processing technologies including NoSQL tools such as Hadoop, MongoDB, Cassandra and CouchDB. These offerings comprise the bulk of “Big Data” deployments and share similar security challenges. For example, The Hadoop Distributed File System (HDFS) is used to store data that needs to be analyzed. Software frameworks such as MapReduce or Scribe process large amounts of data in parallel on large clusters of commodity computer nodes. Tasks are distributed and processed in a completely parallel manner across the cluster. The framework sorts the output, which can be used as input to the reduce tasks. Typically both the input and the output of the job are stored across the cluster of compute nodes.

The ability to perform complex ad-hoc queries against massive disparate datasets can unlock tremendous value for enterprises. In order to tap this intelligence, companies are using distributed file systems such as Hadoop. This is primarily because the volume of data has increased beyond the performance capabilities of relational database systems.

While traditional relational databases use the concept of a data container, this is absent in the Big Data world. Instead of a datafile associated with a database, NoSQL implementations scatter files across hundreds or thousands of nodes. As a result, sensitive data that requires protection is no longer in one compact tablespace on a single system, but can be scattered among a multitude of nodes in the cloud.

One of the key challenges posed by NoSQL tools is that while they are great at crunching massive volumes of data, they have virtually zero built-in security or access control capabilities. If a Big Data deployment includes or will include sensitive data, it’s imperative to put data security and access controls in place. Operating a Big Data infrastructure without some form of security is a very high risk endeavor.

The following threats and how to mitigate them are important considerations in Big Data environments:

Privileged User Abuse – keeping system administrators from accessing or copying sensitive data.
Unauthorized Applications – preventing rogue application processes from touching your Big Data.
Managing Administrative Access – While system administrators should not be allowed to access data, they may need access to the directory structure for maintenance operations and performing backups.
Monitoring Access – Understanding who is accessing what data in a Big Data repository allows for necessary auditing and reporting.

When it comes to protecting and controlling access to Big Data, encryption combined with key management are central elements of a layered security approach. Here are some important considerations when securing Big Data environments:

Classify Data & Threats – This is one of the biggest challenges for any data security project – knowing what is sensitive, where is it located, what are the potential threats. If no sensitive data is in scope, data protection may not be necessary. If sensitive data is stored in the Big Data environment, it needs to be protected. Talking to the Big Data development team about the nature of the data is a first step.
Encryption & Key Management – Taping the key to the front door just above the door knob is not a security best practice. In the same vein, storing encryption keys within the data environment they are protecting is also not a best practice.
Separation of Duties – this has many implications, but one is that encryption keys should never be under the control of IT administrators.
Costs – Minimizing silos of encryption and key management typically reduces costs and minimizes scalability, audit, and total cost of ownership issues.
Performance – Enterprises are embracing Big Data for its potential to enable faster decision making. By the same token, encryption and key management should not significantly slow down Big Data system performance

Big Data promises to b.e the proverbial goose that lays golden eggs. Understanding the data security and privacy risks associated with a Big Data environment early in the development process, and taking appropriate steps to protect sensitive information, will prevent that goose from getting cooked.

Todd Thiemann is senior director of product marketing at Vormetric and co-chair of the Cloud Security Alliance (CSA) Solution Provider Advisory Council.

Big Data, Big Cloud, Big Problem

Cloud Security Alliance (CSA)

By Todd Thiemann

You may also like

A New Approach to Advanced Threat Protection

Rethink Cloud Security to Get Ahead of the Risk Curve

Context + Analytics = Good Security

#Infosec2024: Data Security Needs to Catch Up With Growing Threats

Cybersecurity in Precision Agriculture: Safeguarding America’s Connected Fields

What’s Hot on Infosecurity Magazine?

AI Agents Now the Enterprises Fastest Growing Exposed Attack Surface

Open AI Claims Its AI Models Went Rogue and Hacked Another Company

Ubuntu snap-confine Vulnerability Enables Local Root Access

TrickBot Ditches HTTP for DNS Tunneling in Latest Variant

Russian Hacker Turns Jailbroken Claude Into Pentest Platform

Iranian Hackers Target Siemens and Schneider Industrial Systems, CISA Warns

Open AI Claims Its AI Models Went Rogue and Hacked Another Company

Cybersecurity’s Economics Are Broken. Automation Alone Won’t Fix It

Same Front Door, New Visitors: Securing Humans and AI Agents at the Browser

Single Prompt Enables ChatGPT to Execute Full Cyber-Attack Chain, Researchers Claim

Novel OAuth Client ID Spoofing Technique Targets Cloud Environments

FBI Warns of Deepfake Videos Impersonating IC3 Leadership

68% of Businesses Say Employees Are Their Biggest Cyber Threat. Now What?

Same Front Door, New Visitors: Securing Humans and AI Agents at the Browser

How to Manage Enterprise Cyber Resilience in the Age of AI

Financial Services Cyber Resilience: Stress Testing Third Parties Before Attackers Do

Why Resilience‑Focused Cloud Design Is Your Best Defense Against Modern Attacks

Securing M365 Data and Identity Systems Against Modern Adversaries

How Faster Cyber-Attacks Are Reshaping Enterprise Cybersecurity Strategies

Researchers Claim First Fully Agentic Ransomware: JadePuffer

AI is Already Powering Cyber-Attacks. Can it Power Cyber Defense?

Google Cloud's New CISO Chris Betz on Integrating AI in Cyber Defenses

How World Cup Password Trends Can Increase Active Directory Risk

New CISA Guide Helps Agencies Adopt SASE For Zero Trust

Big Data, Big Cloud, Big Problem

Written by

Cloud Security Alliance (CSA)

By Todd Thiemann

You may also like

What’s Hot on Infosecurity Magazine?