How to Avoid Drowning in your Security ‘Data Lakes’

Security monitoring is a ubiquitous task throughout all enterprises that are attempting to not only thwart malicious activities but to understand and optimize authentic traffic to their information systems. The problem is that the monitoring activities generate huge ‘data lakes’. These lakes contain extremely valuable raw data but mostly sit unused because of the difficulty in dealing with large, complex and diverse data sets and because of a lack in knowledge related to what analytics truly means.

The good news is that today’s big data technology and state of the art analytics can dramatically improve the ability to quickly and effectively mine this data to produce the necessary reports, insights and visualizations to optimize an enterprise’s data security environment. The bad news is that most security organizations are enamored with data lakes for the wrong reasons – they have read about data lakes so they must have one. Therefore, first they spend a few years (and a few million dollars) building the lakes with very little thought about what they will do with the raw data, how it can be used to increase their security posture or even how they will measure what benefit they will reap. They very often end up in the middle of a large data lake with no possibility of making it safely to shore.

However, avoiding this fate is not that hard – the industry has over ten years’ worth of good and bad experience – and that means you can learn from the best (and the worst). So skip all the mumbo-jumbo two-pagers that regurgitate the normal herd-feed and do a little bit of thinking your end before you jump into the deep end, and yes – this article too is a two-pager – so use it only to get you thinking again.

Five DOs and DON’Ts

1. DON’T START BY BUILDING THE LAKE ITSELF
Your first step should not be to figure out how to dump all the data into one place – even if that place is now a Big Data platform. Dumping data and even indexing it is easy and cheap these days – but garbage-in-garbage-out. Instead, start by defining the analytics and the REQUIREMENTS of why you are doing this project (and please don’t define the requirements as “the ability to store all security events for three years in one place “). Build a solution for that ONE thing that has eluded you and your SIEM implementations. Then add another; then another; then generalize. The problem today is not the storage or the computer or the cost of Big Data – it’s the development and the productivity. So, do projects that can show you how productive each option makes you.

2. DON’T REPEAT WHAT YOU’VE ALREADY DONE – DO SOMETHING NEW
This may be uncomfortable at first, but the way you tackle hard problems these days is different. Machine learning techniques, clustering, decision trees etc. work – they work really well, and they work really well on large data sets and when history is available. In fact, security is a fabulous space for these techniques. The problem is that people are very often more comfortable with what they can understand or what they know from before. So they often say “show me first how the machine decided what it decided and then, when I understand the logic, I will accept it and off we go”. However, it doesn’t work that way. Someone can explain the general algorithm to you and show you a sample analysis – but don’t ask them to show you a precise decision path. Learn to trust the machine.

3. KNOW WHERE THE MAIN COSTS ARE
The highest cost is the development, the functionality and the analytics – not the platforms. So look for technologies that make this easy – not for technologies where everything needs an army of people. Many of the Big Data platforms are so complex and so fragmented that you will never manage to do anything. Look for flexibility, agility, simplicity and rapid development – things like NoSQL 2.0 built for Big Data is an excellent example. Look at other areas close to what you’re doing – like analytics on Internet of Things (IoT) data and Machine-to-Machine (M2M) data. Start with some concrete project and get to it quickly (rather than wait a year or two for the lake to be filled).

4. DON’T FOLLOW THE HERD
It’s true there is safety in numbers but a lot of what you are thinking about may be five years old and has been abandoned by the leaders. For example, while everyone was rushing to do “classic Hadoop” (Map Reduce, for example) Google had abandoned it for over eight years. Skate to where the puck is – not to where it’s been.

5. SKATE TOWARDS WHERE THE PUCK IS (BUT, IT’S OK TO SKATE SLOWLY)
Most companies are not Google or Facebook or Netflix. Don’t assume you have the resources, talent and drive these companies or some of the innovative startups have. Your data sets are often not “Google size” and your requirements are not at that level. Choose based on your needs – not others’ needs. Analyze your own needs and only then pick – or you will end up with something optimized for someone else.

If I had to pick one guiding principle it’s this – focus on what you want to do first and define it based on functionality and outcomes (with the data lake helping you to do it). Don’t set out to just build a large data lake and do think about how to pragmatically utilize this valuable data-without drowning.

How to Avoid Drowning in your Security ‘Data Lakes’

Ron Bennatan

You may also like

Infosecurity Europe 2014 > Big Data is No Universal Solution for Security Intelligence

Brain hacking for neurocomputing inches closer to reality

Comment: Under BYOD Pressure

Comment: Combating cyber crime with protective monitoring

Educating children on data protection

What’s hot on Infosecurity Magazine?

Most IT Leaders Say Severity of Cyber-Attacks has Increased

Chinese Espionage Group Upgrades Malware Arsenal to Target All Major OS

Russia Shifts Cyber Focus to Battlefield Intelligence in Ukraine

Exclusive: Paris 2024 CISO Reveals Cybersecurity Plans for the Olympics

Prolific DDoS Marketplace Shut Down by UK Law Enforcement

Cybercriminals Exploit CrowdStrike Outage Chaos

Fact vs. Fiction: Dispelling Zero Trust Misconceptions

Cybercriminals Exploit CrowdStrike Outage Chaos

Exclusive: Paris 2024 CISO Reveals Cybersecurity Plans for the Olympics

CISA's Jack Cable Discusses US Push for More Secure Software

Chinese Espionage Group Upgrades Malware Arsenal to Target All Major OS

North Korean Hackers Targeted Cybersecurity Firm KnowBe4 with Fake IT Worker

The Future of Fraud: Defending Against Advanced Account Attacks

Mastering IP & Data Security in the Industrial Age

Experiencing a DDoS Simulation to Enhance Defenses

How to Unlock Frictionless Security with Device Identity & MFA

Adapting to Tomorrow's Threat Landscape: AI's Role in Cybersecurity and Security Operations in 2024

How to Proactively Remediate Rising Web Application Threats

#Infosec2024: Claire Williams on Leadership, Cultivating a High Performing Team and Overcoming Adversity (video)

#Infosec2024: Navigating the Ransomware Toll on Victims with Jason Nurse (video)

#Infosec2024: Experts Share How CISOs Can Manage Change as the Only Constant

#Infosec2024: 104 EU Laws Have Different Definitions of Cybersecurity

Infosecurity Magazine Autumn Online Summit 2024: Day Two

Infosecurity Magazine Autumn Online Summit 2024: Day One

How to Avoid Drowning in your Security ‘Data Lakes’

Written by

You may also like

What’s hot on Infosecurity Magazine?