AI Must Prove its Trustworthiness

Written by

When you feed the wrong data source into a large language model (LLM), you get data poisoning – yet another term suddenly in the common vernacular in tech companies. Right now, LLMs aren’t built to ‘unremember’ information.

Within the giant infrastructure of OpenAI and other platforms, it’s near impossible to remove inaccurate data, so it’s up to security leaders to make sure that LLMs are only getting fed the right information in the first place.

How do we know what our AI engine knows, and how do we trust that what it’s learning is actually coming from a source of truth?

Amid all the excitement over GenAI and how it makes machine learning accessible, we must turn our attention to the security and veracity of the data we are feeding into the models, and what they’re outputting.

Feeding AI Security Source Code is a Bad Idea

For every security tool built to detect a cyber-attack, there comes an invasive workaround by cybercriminals. With AI, the biggest cybersecurity risk comes in the training.

An AI tool can be used to solve repetitive tasks or optimize known patterns, so feeding sanitized security code into an AI tool can help make the code more efficient. However, as tempting as it is to feed an AI model the source code of your security tools, that action comes with an unwelcome risk. You’re essentially handing your AI model the tools it needs to circumvent your security system and eventually create malicious output.

This is particularly true when embarking upon creative or innovative use cases that require sensitive data to be fed into the model. For obvious privacy and security reasons, this is not a good idea. Depending on the GenAI learning mode, it could become very difficult – even impossible – for the model to unlearn the data.

Urgent Need to Solidify Guardrails

There’s an enormously valuable use case for using AI to write low-level code. It’s already being done in most organizations. Engineers regularly use ChatGPT, GitHub and other GenAI tools to write their everyday code. But broadly speaking, engineering organizations need to create guardrails around AI-created code. Relying on the ‘knowledge’ of GenAI can be a security issue if the model has been fed with data for malicious intent (back to that idea of data poisoning).

The yellow flag here, for me, is being able to differentiate between machine-written code and human-written code down the line. If there turns out to be a problem with the machine-written code, it needs to be flagged so it can be swiftly retired or corrected. Right now, AI-written code is being inserted seamlessly into the mix so that it’s impossible to tell who, or what, wrote which snippets of code. The lines are blurred, and we need them to be clear with some sort of delineation or tagging mechanism.

Another thing to consider: Code written to perform sensitive tasks and handle sensitive data must be considered very carefully before being optimized by AI. Once that code is submitted and learned, it becomes public knowledge of a sort and, again, it’s impossible for the model to unlearn what you teach it. Think twice about whether you want to share your secrets with strangers. 

AI Must Prove its Trustworthiness

Not only does AI need time and data to grow, but we – real live people – need time and information to adjust to trusting AI. When the cloud was introduced a decade or so ago, people were skeptical about putting their content up there. Now, it’s considered the status quo and, in fact, best practice to store content and even apps in the cloud. This comfort level took years, and the same will happen with AI.

For security professionals, zero trust is a mantra. Starting with strong security infrastructure ensures secure products, and one of the main goals of CISOs in the coming months and years will be to ensure that AI is being used properly and that features are secure. This means applying all foundational aspects of security to AI – including identity and access management, vulnerability patching and more. 

Google’s Security AI Framework (SAIF) and the NIST AI Risk Management Framework are both attempts at creating security standards around the building and deploying of AI. They are conceptual frameworks to address top-of-mind concerns for security professionals, mirroring the idea of zero trust, but specifically in the AI space. 

With guardrails in place, and some time, trust around AI as a concept will grow. 

What’s hot on Infosecurity Magazine?