Why I’m Not Sold on Autonomous Security

Maybe you’ve heard about a new security product with super-smart AI and Machine Learning capabilities that can root out both known and unknown intrusions. The Intrusion Detection System (IDS) on this product is so smart it learns your network and does not need to inform you every time it sees a new anomaly, and it’s maybe its Intrusion Protection System (IPS) will block all malicious traffic without human intervention. The AI system boasts a 99% accuracy on detecting new attacks. 

Amazing pitch, but, having worked on the team that built the first-place winning computer reasoning system at the DARPA Cyber Grand Challenge, I am not sold yet. Here’s why this pitch isn’t working for me:

  1. The above pitch confuses detection with intrusion. An attack may not be successful, whereas an intrusion is. Suppose you detect five new attacks, but only one resulted in a real intrusion where data was leaked. Wouldn’t you want to focus on the one successful intrusion, not the four failed attacks?
  2. ML systems may not be all that robust, meaning it works well with one set of data (the vendor’s) but not with another (yours). In a nutshell, an attacker’s job is to evade detection, and so far the research has shown it’s often not hard to evade ML detection, especially when it is trained to look for X and not Y or Z. AL and ML products don’t understand how easy they are to evade. Sure, deploy it today, great, but what about tomorrow?

Machine learning isn’t robust
In general, ML is not generally intended to defeat an active advisory, who can quickly change attack methods. Academic research in adversarial ML is still very new, so products promising these capabilities are likely still immature. I don’t think there’s a system ready for full autonomy, which is the vision very similar to the DARPA Cyber Grand Challenge -- for an autonomous security system (a cyber reasoning system) to detect, react, and defend. The two reasons above don’t add up to full autonomy.

Also, designing a fully autonomous system that goes around and chases attacks (the vast majority of which are unsuccessful) would be useless; it’s really about preventing real intrusions.

There’s confusion in the market today between detecting attacks and detecting intrusions. Attacks are a possible indication of an intrusion, but they may or may not be successful. An intrusion is someone having success.

While you can plan for whole categories of attacks, you can’t always plan for what constitutes a successful intrusion. So almost any rate of false positives — even if it is low — is really too high if there are intrusions. 

My Recommendations
First, think about whether you’re looking for a short-term fix or long-term fix requiring robustness against evasion. If you only want to stop internet chaff, new machine learning products may help. There isn’t a scientific consensus that they won’t be easy to evade as attackers learn about them.

Second, think hard about what you want to detect: Do you want to study the attacker or are you in charge of responding to real problems? For example, the base rate fallacy teaches us that if your organization has relatively few intrusions per attack (ask your team if you do not know!), the iron-clad rules of math mean the hard limits on any approach -- ML or not -- may not be on your side.

Where can ML really help? The jury is still out, but the general principle is ML is a statistic, and will better apply where you are trying to marginally boost your statistical accuracy. I use the word “statistic” on purpose: you have to accept there is risk. For example, Google has had tremendous success on boosting ad click rates with machine learning because a five percent improvement means millions (billions?) more in revenue, but is a five percent boost enough for you?

A second place ML can help get rid of unsophisticated attackers, for example: that script kiddie using a well-known exploit. In such a setting we’ve removed the need to be robust since we’ve defined-away someone really trying to fool the algorithm.

Eventually ...
Finally, I want to reiterate the amazing research being done in ML today and the researchers studying it. We need more. My opinion is we are not “there” yet, at least not in the way an average user would expect.

I do have more confidence that we can make parts of application security fully autonomous. Why? Application security testing techniques, such as fuzzing, have zero false positives, and attackers don’t have control over which apps you deploy, so the idea of evasion doesn’t even come into play.

Do I think there is a chance ML-powered IDS will become a fully autonomous network defense? Not today. Not without even more research.

What’s Hot on Infosecurity Magazine?