How Traditional Machine Learning Is Holding Cybersecurity Back

While global cybersecurity spending now surpasses $100 billion annually, 64 percent of enterprises were compromised in 2018, according to a study by the Ponemon Institute. What explains this less-than-impressive ROI? The standard answer is that wily cyber-criminals are employing ever-evolving, increasingly sophisticated attack methods, part of a never-ending game of cat-and-mouse in which they all too often outsmart the good guys.

This is undoubtedly true – but the root of the problem is that traditional machine learning-based cybersecurity solutions fail to keep up with the growing sophistication of today’s cyber threats, both those that are created by hackers and AI alike. 

Why does machine learning so often come up short – and how should cybersecurity evolve to meet the scale and complexity of the challenge?

Fighting Yesterday’s War

There’s no question that machine learning has driven significant improvements in cybersecurity. Harnessing massive datasets on prevalent malware and prior attacks, machine learning-based solutions are capable of rapidly identifying and thwarting threats.

The problem? To paraphrase former U.S. Defense Secretary Donald Rumsfeld, it’s not enough to go after the known knowns, it’s the known unknowns and the unknown unknowns that are going to cause the most grief.

As the threat landscape has evolved, machine learning is failing to remain resilient in the face of advanced new malware, created by both hackers and artificial intelligence. Fighting yesterday’s war may work when today’s threats are the same as yesterday’s – but not when novel threats are constantly emerging.

Not only is machine learning regularly failing to identify new malicious threats, but it also routinely misidentifies benign ones, yielding a high rate of false positives and creating unnecessary additional work for enterprises’ security teams.

Machine Learning’s Limitations

Traditional machine learning suffers from several factors that impede its ability to prevent complex and first-seen attacks.

Chief among these is data. Only a fraction of the available data will be fed into an algorithm that trains a machine learning model. A computer scientist with a focus on cybersecurity will curate a set of features that he or she recognizes, and this will be used to train the algorithm. This means most of the data in the file won’t be used for training as the system can only learn from the vector of features identified and defined, leaving most characteristics in the data set untouched.

The brains behind those models are invariably brilliant, but nevertheless fallible. Raw data can’t be fed directly into machine learning systems, so the extraction of the data – which is based on human professionals’ knowledge and expertise – unavoidably limits the system. What’s more, hackers understand this and build malware capable of tricking machine learning systems into thinking it is benign.

Because organizations face resource constraints, they can’t hire an unlimited number of computer scientists with cybersecurity expertise to engage in the labor-intensive task of continuously updating and developing data sets. Even if they could, there’s a limit to the size of the dataset that can be used to train a machine learning model before reaching a learning curve saturation – the threshold beyond which the system no longer improves its accuracy.

Additionally, most machine learning models only support portable executable files, so attacks that use other types of files, or even file-less malware, move freely past these cybersecurity solutions.

Not only do machine learning-based cybersecurity solutions fall short on prevention, but according to the Ponemon Institute, it takes organizations 196 days on average to detect a breach. This lag time – more than half a year – is far longer than organizations can afford, particularly when their assets and their reputation are on the line.

The Next Age of Cybersecurity: AI vs. AI

As with all market evolutions, sheer demand will create enough pressure for providers to improve their cybersecurity services and move beyond outmoded machine learning solutions. Given the massive investments enterprises are making in cybersecurity – and the hefty costs associated with breaches – companies are clamoring for preventative capabilities that both hold down costs and provide dynamic security protection.

The result will be advanced AI-based technologies, such as deep learning, that leverage much more of the available data and files, enabling analyses that yield higher levels of detection and prevention, lower false positive rates, more autonomous features, and smaller staff requirements.

To be sure, there’s no magic-bullet solution that will provide 100 percent protection, once and for all. Cyber threats are constantly changing – and it’s time cybersecurity solutions better reflected that.

What’s Hot on Infosecurity Magazine?