#INFOSEC17 Machine Learning is Positive, but not a Security Solution

Technologies such as machine learning and anomaly detection are not a solution to security, as to use them you need more to identify bad instances.

Speaking at Infosecurity Europe’s Intelligent Defence conference, Giovanni Vigna, CTO of Lastline covered the theme “Adversarial machine learning: pitfalls of AI-based security” and the emerging defensive technologies of artificial intelligence, machine learning, deep learning and anomaly detection.

He opened claiming that AI is an "attempt to recreate behavior by a machine"; machine learning is based on statistical analysis of data; and the subset of that is deep learning. However, he said that these technologies work only "with large data sets", as, with machine learning in particular, you need a lot of data to do data analysis "to characterize and say one is a class of things" in order that you classify it into a group.

“The biggest learning of machine learning is the ability to learn from things you know, and classify what you didn’t know before,” he said. “You can cluster samples in a certain way and create classifiers, but you take humans out of the loop, as they are incredibly expensive, and you always need analysts. But, if we can reduce the need for analysts and focus on the need for important stuff, we’re in good shape, and that is why machine learning and security is so.”

In terms of how machine learning can be used for negative purposes, Vigna said that what your machines have learned can be taken and used against you. Using cat pictures as an example, if someone dressed as a cat, then if your machine learning determines that to be acceptable, your machine learning could be used by an adversary.

Vigna explained that in adversarial machine learning, an adversary can steal your learned model, and modify samples once they know how you learn so it is mis-classified. “It is important on what you have based your data on,” he said. “With malware, you do not base your analysis on the static appearance of the sample and illicit from the sample what sample actually does, you ask does it modify? This is a blind spot of a different model.”

Looking at anomaly detection, Vigna acknowledged a renewed interest in this despite it being ‘around forever’, but compared to signature-based anti-virus where you can only detect based on what you’ve seen before, anomaly detection is modelled on what is good, and if it is not determined to be good, it must be bad, and then you need to find everything that must be bad.

“Modelling good behavior is time consuming and people don’t want to do it,” Vigna said. He explained that this involves listing each server and what it is doing, and the time required to do the task; to learn good behavior requires automation with no human intervention and continuous upgrading on what is good and comprehensive. “What is bad may not be anomalous, so it could be wrong,” he said.

He concluded by saying that the solutions lie in admitting that you cannot use machine learning in a simple way, and good starting points for analysis are breach detection events.

“This gives you confidence in detection as you use a known compromised host and look for evidence of anomalous behavior, so you are threat hunting as you know a machine been compromised. You can see what the hosts have done and see similar patterns, so you can support machine learning for new threats.”

Vigna said that ‘innovation is very important’ as the threat is evolving, and cybersecurity comes up with new ideas and new technology to tackle the evolving threat, but composition has allowed the move from tracking to hunting, and we are finding more instances in the network that are potentially bad, and new technologies allow us to learn additional things to conduct a better analysis of a network and find things that are bad.

What’s Hot on Infosecurity Magazine?