Using Graph Search Engines and High Performance Servers to Find Malware Patterns

Written by

As the cost of undetected malware increases, organizations - and particularly those in the financial sector - are motivated to identify cyber threats as swiftly as possible.

By 2021, the estimated amount of damages due to malware will be $6 trillion - as cyber-criminals place malicious software inside computer networks to gain access to bank accounts, exfiltrate account information, transfer funds, or extort money through ransomware. 

There have already been around 3,500 successful cyber-attacks against financial institutions this year, according to reports filed with the Treasury Department's Financial Crimes Enforcement Network. A single hack into Capital One yielded the personal data of over 100 million people.

All of this illustrates just how mission-critical it has become for security specialists to identify and neutralize threats instantly. Yet today’s threat detection tools face formidable challenges, particularly when analyzing massive datasets.

In response, banks and other large organizations are exploring new search and compute technologies to find malware faster. New in-memory computing and graph search tools can identify cyber risks in near real-time, condensing what typically takes weeks down to just minutes. 

For security experts who work with particularly large datasets, and who are vexed by the time it takes to find and neutralize undetected malware on their networks, there are steps which can help.

Use tools with greater scale
There is too much data for conventional tools to scan in a reasonable amount of time. Organizations must regularly scan their network log data to identify lateral movement. Yet banks can generate multiple terabytes of network log data per day, which means threats cannot be found in a meaningful timeframe.

Conventional tools will simply never catch up to the amount of data being generated and the number of incoming threats hitting the network. That’s one reason why the mean dwell-time for malware is 71 days, and that interval exposes organizations to a whole lot of potential damage. 

You need to look beyond conventional graph databases to find malware. Graph databases are scalable both vertically and horizontally, without introducing data integrity or consistency issues, and work very well for smaller datasets. The challenge is that graph databases don’t scale well once you get into terabytes of data. They lose steam in the small terabytes of data sizes with dramatic declines in performance on larger datasets. This happens for two reasons:

  1. Because scaling horizontally results in nearly every memory fetch (edge traversal) requiring a message to be sent across a network to some other node
  2. Keeping data on disk and working with only a small part of that data in memory results in thrashing of data between RAM and disk in order to traverse edges.

Graph search tools, on the other hand, are built for very large datasets. The Department of Defense helped develop the Trovares graph search tool which is now commercially available. The technology adopts supercomputing techniques such as extreme multithreading and fine-grain locks to achieve orders of magnitude increases in speed and scale.

A team of data scientists applied analytics and supercomputing expertise to deliver a significantly different graph search tool that returns queries hundreds of times faster than conventional graph tools. It supports very large in-memory graphs for fast queries, and enables the direct ingest of data into the system to avoid database performance issues.

Look beyond clusters
Secondly you need to consider computing platforms designed for extreme performance. Server clusters are not ideal for graph search; the typical computation over a graph data structure is among the worst for clusters. Symmetric multiprocessor systems (SMP) however, are excellent for graph search. Implementations from the team commercializing the DoD technology were built on SMP systems such as HPE’s Superdome Flex.

Today a single SMP system can range in size from three to 48 terabytes of memory and more than a thousand threads of execution, providing the balance of memory capacity and processing capability to meet the demands of scaling graph search performance. These platforms are built on industry standard x86 processor technology and PCIe-based IO to enable high performance ingest of data and support for the full range of software needed to complete a workflow around the graph search tool.

The performance of graph search on an SMP system is impressive. Benchmark data show near linear scalability when querying three terabytes of cyber data with 20 billion graph edges and 212 billion edge properties. The combination of the graph search tool and an SMP system demonstrates orders of magnitude improvements in speed, reducing query time from 179 hours to 12 minutes. You should expect these search tools on SMP systems to outperform conventional tools on datasets of all sizes but excel when data exceeds a billion records.

Search all the data
Rather than take a small slice of data and looking at it, graph search tools are able to ingest more of the data and answer complex data searches. The performance boost of graph search combined with SMP systems lets it leapfrog conventional search tools. In this way it can quickly find intrusions that have continued to reside in the data. More importantly, speed and scale allow organizations to approach zero malware risks, with no unscanned data, and no multi-hour or multi-day scans. 

The malware challenge has grown. The costs are higher and the technical challenges are greater. New graph search tools combined with SMP systems are showing how companies can win the battle against malware. Large enterprise organizations, including banks, telecom companies, bio sciences, and other industries are adding graph search and SMP systems to their cybersecurity roadmaps. They have found the performance needed to overcome the speed and scale challenges of finding malware.

What’s hot on Infosecurity Magazine?