Using Machines to Understand When Normal isn't Normal

Written by

It seems like only yesterday that the average cybersecurity analyst’s day consisted of battling SQL injections and viruses, but so much has changed in our industry over the last decade that those problems are unrecognizable to the threat we face today.

While we’re certainly not pining for those days to return, the unwanted, unpleasant threats were at least quite easy to see. Those days of the transactional attack are far behind us. 

The threat we now face is undeniably more difficult to evaluate. Attacks might run for months on end and cross any number of machines, identities and accounts. When we see a potential incident, we need to quickly understand who the user is, where they went next and critically, if this is normal behavior for that user. 

Understanding normal

Despite what you might read, companies have long attempted to build a picture of normal user behavior to help uncover threats. Most companies have thrown people at the problem – employing more incident response (IR) analysts to sift through data and make judgements about events. 

These incident responders normally start by looking at domain controller records to see who was using a particular IP address at the time of an incident. Next, they might search through logs to see what the IP did before and after. If they’re experienced, they might notice a connection between the user’s workstation and a remote server, using unrelated account details. After considerable time and effort, the IR analyst should have created something resembling a timeline of all the events around the time of the incident. 

Understanding whether or not user behaviors are normal from this timeline is another challenge altogether. Activities that might be perfectly normal for a database administrator might be very unusual for an employee in the HR team. In order for the IR analyst to determine if the activities are normal for the user, they will likely run many more searches and queries on historical data, put the findings in a reporting system and see if there are any trends to indicate potential risk. Safe to say, this process can take days or weeks.  

Automation was introduced in order to try to help analysts speed up this process. It takes various forms, but the typical use cases are scripts that automate data collection and signatures to detect certain types of attacks. In more recent years, we’ve started to see event correlation being used to help uncover well-defined, network-based attacks. The common example for this is an employee logging on from home over the VPN, who also just badged into the building. Event correlation can notify an analyst that those two events shouldn’t be happening at once.

The new normal – reshaping the playing field

The existing model for security analytics and intelligence has not kept pace with the threat landscape and so its effectiveness has been questionable. Why? The biggest factor without doubt is data. 

The volume of data that could be used in a security investigation has been growing so quickly that it is now not uncommon for a large business to collect as much as 300 terabytes of data every day. This data deluge comes from systems generating more data and SIEMs collecting data sets that simply didn’t exist before (e.g. IoT devices). This means it’s often too expensive to store enough historical data to effectively support a security investigation. In many cases, only 30 days’ worth is kept at any time. The thinking behind this is that if any more is kept, the sheer volume could overwhelm the reporting system. 

The IR analysts facing this sea of data are also likely to be overwhelmed and miss important trends. The obvious answer to the challenge might once again be to throw more bodies at the problem, but the simple truth is most companies, even the big ones, don’t have the funds to hire enough expert threat hunters. And that’s presuming that this threat hunting expertise even exists at such scale – it doesn’t.

At the same time, businesses are in a much greater state of flux, with employees being replaced by outsourced capabilities or temporary workers that turn over on a more regular basis. This makes it harder to identify who is an actual employee or user, let alone build a clear picture of that user’s normal. In short, IR teams are simply unable to process enough useful security data to understand whether or not there is an imminent threat.

Machines aren’t magic

When security tasks were simple and static, IR analysts could rely on machines, in the form of automation, to help streamline tasks. This worked well when there wasn’t too much data, when the data was a common format, when the threat techniques didn’t change too often and when attacks were solely network-focused. Needless to say, those days are firmly behind us. But, while the threat landscape has become more challenging, thankfully, the machines have become a lot smarter. 

Recent developments in AI and machine learning have been met with jubilation and a fair amount of hype within the industry. The problem with the emergence of these add-on technologies is that vendors have been very lax with their descriptions and they’ve created confusion in the market.

When customers hear a vendor urging them to “pour data” into their machine learning based analytics engine, customers expect wonderful things to simply pop out the other end. In reality it doesn’t work like that. Too many organizations believe that machine learning and AI is magic. 

That’s not to deny the usefulness of these technologies. Understanding normal behavior is one area where developments in artificial intelligence and machine learning can be applied with great success.

There are now algorithms that can create context by connecting events into coherent user sessions. The combination of algorithms and statistical analysis can answer a huge range of questions incredibly quickly: is this a real user or a service account? Is this person an admin? Does this activity deviate from this user’s peer group’s activity? Is the user of account A also logged in under account B? 

Finding the middle ground

Putting the pieces together, the best way to cope with the huge volume of data and more complex threats on our networks is to augment, not replace, human intelligence with machine intelligence. A good machine-based analytics system should continually ingest new data, understand any alterations in a user’s normal behavior, stitch individual activities into timelines and then analyze the timelines to see if there are any risky behaviors – a task that could take an IR analyst a week or more per user. 

The analyst could review a user session the machine created to more quickly notice a deviation from that user’s normal behavior. Taking it one step further, a machine could automatically score anomalies and assign a points system to each and every user. This will help reduce false positives and alert fatigue.

These developments in machine automation don’t mean that organizations can get rid of their entire security team, or even spend any less time hunting for threats. We hear a lot of people talking about the ‘security silver bullet’ and machine learning isn’t one. Rather, it’s a means to make the incident responder’s job a lot easier, but only if we can strike a middle ground between man and machine.

What’s hot on Infosecurity Magazine?