Interview: Etienne Greeff, SecureData

Is there a way to determine what data is where on your network, and which documents contain which keywords? Even worse, what if an attacker could find the precise documents by doing an automated search?

Speaking to Infosecurity ahead of his talk at RSA Conference, Etienne Greeff, CTO of Secure Data, discussed the concept of using machine learning to scan a network to find key words, and how it can be done in reverse.

Explaining how the concept works, Greeff said it is based on what he called “topic modelling,” which is an unsupervised learning algorithm that allows you to look at a large set of text and figure out which words occur the most, and which words occur together the most.

“So what that allows you to do, is if I have a bunch of documents I can quickly and accurately discern what topics are being discussed in those documents,” he said. “Let’s say on a network I can figure out what's discussed on a network or on an endpoint, and determine the user to be a financial guy or an accountant, and with a high degree of accuracy, figure out what it is about.”

In order to turn this into a weapon, Greeff said you would create a watering hole attack and entice the target to go to that website, and download a topic modelling payload. “On my command and control I can determine the words I care about, and only extract documents that only talk about passwords, patents, copyright and filing documents. So I don’t need to know anything about the user or the documents, just get what I care about in an automated fashion.”

He explained that the watering hole will deploy the payload onto the network, and categorize the documents that relate to those key words. This would allow an attacker to engineer the watering hole and extract documents.

“So you can quite accurately extract documents, without any knowledge of the network.”

Greeff described it as a form of “reverse DLP” as you use machine learning to sift quickly through documents. Infosecurity asked if this could enable some sort of Big Data analysis, which Greeff agreed with, saying that topic analysis is commonly used by libraries or news aggregators, but this version of weaponizing it allows an attacker to use it for nefarious means.

Greeff claimed that machine learning is more suited to the attack of a network than for defending an organization “but there are not that many offensive uses of machine learning.” While there are uses of image manipulation to alter machine learning, and there are examples of creating text to confuse AI, many are very academic and do not scale to the business.

“So we decided to create a machine learning tool that you could use as a hacking tool in scalable, repeatable and automated fashion, and I feel we succeeded as it is a totally different form of attack as it is a new way to think about how to attack a network.”

So what about defending against this? Greeff was not confident that it was easy to defend against, saying that every DLP vendor uses “regular expressions” or hashes to determine what is leaving the network. “Everybody uses very elementary ways of determining what is leaving the network, but if you do topic modelling you can quite accurately discern what topic is in what document.”

He said that you can do topic modelling “on the fly” to understand where sensitive documents are on the network, and figure out what topics are leaving your network “without having to rely on pattern matching, certain words or snippets of stuff.”

The use of topic modelling is something every DLP vendor should be doing, he claimed, adding that it gives you a much better way of detecting what is leaving and more granular visibility.

So why do this now? Greeff said that it came from his own interest, and he believed that it is the first scalable attack using machine learning to get to data in a very efficient way in a victim’s network.

Asked if this was a way to build a better DLP, Greeff said that this could be the future of DLP as if it used the same techniques to attack networks, “DLP would be much more efficient, and you would not have to configure as much as you do with current DLP.”

He concluded by saying that the way to think about this is as a “data-driven attack” and not a technical attack, as an attacker can determine the sort of data they are interested in, and using machine learning, they can extract that data quite efficiently.

“In the past it is a very manual job, as the attacker has to get on the network and get a bunch of documents and manually review each of the documents to see which ones they care about, and that just doesn’t scale. With this, it is scalable.”

Interview: Etienne Greeff, SecureData

Dan Raywood

You may also like

Why Investment in Autonomous Cyber Defense is Needed

#SplunkLiveLDN: Listen to Your Machine Data and Act on the Results

Fighting Cyber Threats with an Open Data Model

On the Seventh Day of Christmas, the Industry Predicted…More Mention of AI

Five Continents, Five Voices: Siddhesh Patel, Americas

What’s hot on Infosecurity Magazine?

Most IT Leaders Say Severity of Cyber-Attacks has Increased

Chinese Espionage Group Upgrades Malware Arsenal to Target All Major OS

Russia Shifts Cyber Focus to Battlefield Intelligence in Ukraine

Exclusive: Paris 2024 CISO Reveals Cybersecurity Plans for the Olympics

Prolific DDoS Marketplace Shut Down by UK Law Enforcement

Cybercriminals Exploit CrowdStrike Outage Chaos

Fact vs. Fiction: Dispelling Zero Trust Misconceptions

Cybercriminals Exploit CrowdStrike Outage Chaos

Exclusive: Paris 2024 CISO Reveals Cybersecurity Plans for the Olympics

CISA's Jack Cable Discusses US Push for More Secure Software

Chinese Espionage Group Upgrades Malware Arsenal to Target All Major OS

North Korean Hackers Targeted Cybersecurity Firm KnowBe4 with Fake IT Worker

The Future of Fraud: Defending Against Advanced Account Attacks

Mastering IP & Data Security in the Industrial Age

Experiencing a DDoS Simulation to Enhance Defenses

How to Unlock Frictionless Security with Device Identity & MFA

Adapting to Tomorrow's Threat Landscape: AI's Role in Cybersecurity and Security Operations in 2024

How to Proactively Remediate Rising Web Application Threats

#Infosec2024: Claire Williams on Leadership, Cultivating a High Performing Team and Overcoming Adversity (video)

#Infosec2024: Navigating the Ransomware Toll on Victims with Jason Nurse (video)

#Infosec2024: Experts Share How CISOs Can Manage Change as the Only Constant

#Infosec2024: 104 EU Laws Have Different Definitions of Cybersecurity

Infosecurity Magazine Autumn Online Summit 2024: Day Two

Infosecurity Magazine Autumn Online Summit 2024: Day One

Interview: Etienne Greeff, SecureData

Written by

You may also like

What’s hot on Infosecurity Magazine?