Questioning the Machine

Written by

Machine Learning and Artificial Intelligence is being bandied about the security industry with reckless abandon. Both terms seem interchangeable in marketing literature, but if you ask ten people to define them, you may well get 20 different answers - so broad is the spectrum of possible interpretations. 

For a buyer of security technology, the situation is confusing, and this article cannot define a “right answer” because, like the concept itself, the whole question is shrouded in shades of grey. An approach that can help to demystify the situation and help to gain a deeper understanding is to define what it is you want to achieve and then ask the right questions to see if a technology can help you meet the goal. 

This sounds disarmingly simple but, as in life, the devil is in the detail. Let’s start with a very broad set of boundaries that we can perhaps all agree on. 

All software, from smartphone applet to mainframe monster, follows a set of rules that process some form of input to deliver some kind of output. The rules that the software follow tend to be hard coded based on the input and desired output.

In a security context, an appliance like a firewall has a relatively simple set of rules that allow packets to flow based on adherence to a defined set of parameters such as source, destination, contents and other factors. 

However, when it comes to machine learning and AI, the rules should be thought of as models and it is wise to ask about the application of models within any AI/ML platform. Can models be applied to different data sets?

For example, if the content inspection technology is touted as being able to spot suspicious activity, does this only apply to a particular type of web traffic or can the systems inspect application log data, audio data like phone recordings, video data from security cameras, and other transactional data? You may need none of the last three types of use case but understanding the limitations is vital. Applying AL/ML to data can be great, but an organization’s data stretches across data silos, and if AL/ML can only work on certain silos, something is likely missing. 

Computer systems are very good at processing large amounts of data, very quickly, to spot specific sets of data or to make correlations. In a security context, the humble malware scanners largely follow these steps to match sample codes against virus signatures.

However, one of the often claimed benefits of AI is its ability to adapt based on the scenario. As such, buyers need to ask about the flexibility of the AI/ML models that underpin any system. Does the vendor claim to use a proprietary model that will solve "all the problems" and can this model be altered by the customer?  Can different models all work on the same data; or can your data only be worked on by the models bundled with the security product? This distinction is crucial as it is evident that in a security landscape, the threats of a decade ago are not the same as the challenges of today while everyone's enterprise is different, and that includes its security needs. The reality at present is that there is no one size fits all. 

Although IT security has often favored “black box” solutions with non-inspectable code bases and a closed application interface, it is worth gaining guidance on the technical components of ML/AI within the product. Sometimes a product can use simple classification algorithms on a single type of data, and based on that, make huge claims about the inclusion of ML/AI.

Getting the vendor talking about the implementation allows you to assess whether it's a point ML/AI solution or a way to bring ML/AI to security data in a more comprehensive way. 

This line of questioning should also extend to how new AI/ML approaches will be incorporated into the solution. This is critical, as the current ‘state of the art’ is still at a relatively embryonic state and new models and methods are continually emerging that offer both incremental improvements and radical new approaches that can make older models obsolete overnight. Vendors need to describe how this evolution process works and in a best-case scenario, provide examples of when past AI/ML was incorporated into the solution and how that development, testing, implementation and licensing played out.

The last component, licensing, is critical: Was an organization’s data held hostage and kept away from new AI/ML until a fee was paid to apply the algorithm? This isn't 100% bad. For instance, if a new AI/ML was developed by the vendor it makes sense. But, if they just implemented someone else's algorithm on the data when the licensing fee was paid, then that's something an InfoSec practitioner will want to know. 

For all of the benefits of AI/ML, it is still only a tool within a security practitioner’s arsenal and the ultimate arbiter of value is still going to be human. Does the tool help practitioners learn about how data works and help them grow their understanding of data engineering and data science as it pertains to the organization’s data? Or, is the solution a black box in which their organization is forced to rely on the expertise of a vendor to solve security problems?  A balance must be struck between working with vendors and growing an internal talent pool. In many cases, a product that allows growth will serve the organization better for longer and help to build a skills base that is not held hostage to the whim of the vendor. 

What’s hot on Infosecurity Magazine?