“As these systems become more common, the repositories are increasingly likely to be stuffed with sensitive data,” writes Securosis, the research firm that wrote the paper on behalf of security vendor Vormetric. “Only after companies find themselves reliant on ‘Big Data’ do they ask how to secure it.”
The firm said that two factors became abundantly clear during the research project. “First, Big Data projects are common – almost the norm – within the enterprises we spoke with,” researchers wrote. “They have embraced the technology, and they've pushed vast amounts of data into these clusters.
“Second, most have implemented virtually zero security measures,” they added.
The firm’s examination of different Big Data implementations shows that security features are “sparse and aftermarket offerings are not fully tailored to these clusters.” In the rush to implement highly scalable, low-cost clusters for data analysis, security has fallen by the wayside as cost-efficiency wins out on the corporate to-do list. Most deployments are largely insecure, and “wholly reliant on network and perimeter security support,” i.e., password protection, Securosis said.
The good news is that several critical security concerns can be addressed without a Herculean effort – or investment: Big Data clusters share most of the same vulnerabilities as web applications and traditional data warehouses. The top items to look at include how nodes and client applications are vetted before joining the cluster, how data-at-rest is protected from unwanted inspection, privacy of network communications and how nodes are managed.
Securosis’ initial recommendations include using SSL or TLS network security in SQL environments to authenticate and ensure privacy of communications between nodes, name servers and applications. Also, file/OS layer encryption can protect data-at-rest, ensure administrators or other applications cannot gain direct access to files, and prevent leaked information from exposure.
The researchers also recommend key/certificate management. “You can’t store keys and certificates on disk and expect them to be safe,” they noted. “Use a central key management server to protect encryption keys and manage different keys for different files.”
It should go without saying, but companies should also validate nodes during deployment – through virtualization management, cloud provider facilities, or third-party products such as Chef and Puppet. And, they should log transactions, anomalies and administrative activity – through logging tools that leverage the big data cluster itself – to validate usage and provide forensic system logs.
“While these measures cannot provide fail-proof security, a reasonable amount of effort can make it considerably more difficult to subvert systems or steal information,” Securosis noted.
"Based on Securosis' findings, security in typical big data implementations is largely an afterthought," said Derek Tumulak, vice president of product management for Vormetric, an encryption specialist. "The good news is that several critical security concerns can be addressed by a handful of security measures, including the use of file layer encryption to protect data at rest and ensure sensitive information cannot be accessed."