#HowTo: Secure AI and its Software Supply Chains

Written by

The concept of artificial intelligence (AI) has now been with us for 75 years, beginning with Alan Turing’s unpublished paper Intelligent Machinery.

This, and his later papers, posited the conditions that would indicate a computer had reached intelligence levels comparable to humans. Progressing to our current fascination with large language models (LLMs), it has always been an area of study where philosophical debate is as important as the technical details of its production.

While we have made progress in producing generative adversarial networks (GANs) and LLMs over the last 75 years of studying AI, we still struggle with its most fundamental question: How can one universally define intelligence? Humankind has tackled that question with no consensus for at least as far back as Descartes, if not earlier.

Cybersecurity also struggles with its most fundamental question: How can one universally define good security?

Both questions suffer from changing landscapes, techniques and a realization that previous suggestions were flawed in their assumptions. From these unanswered questions follows another: How do we secure AI systems?

Using tools to help with code analysis and dependency management along with risk assessments, as commonly practised, could be enough, perhaps? The famous pioneer of software engineering, Grace Hopper, was never a fan of blindly doing things the way they have always been done, and I believe there’s more to securing AI systems too.

AI systems such as Megatron, at a very simple level, converge input values to a set of outputs through weighted probabilities. The architectural complexity is the number of instances doing this, combined with the stochastic behavior from feedback loops, adjusting and reinforcing the weight and bias values, and so ‘training’ the system. The code that makes up these systems is fairly straightforward, so it may be assumed that the attack surface is limited. However, Rob Van Der Veer recently engrossed an audience at OWASP Global AppSec Dublin, explaining myriad threats to these systems, particularly to the data of such models and their supply chain.

Data is hugely valuable in most systems, and AI is no different. It is here that Rob suggested we should pay the most attention. We need to consider how bad actors could attack systems like DALL-EStable Diffusion or image recognition systems by adding image noise to give incorrect results. Text systems such as ChatGPT could fall foul to websites created with the intent of producing fiction as fact. Such attacks will be difficult to guard against.

Another issue raised in a British Computer Society (BCS) webinar on LLMs relating to education and law is that of copyright. If data in a system is trained by copyrighted works, royalties may need to be paid. The Bing chatbot, for example, quotes in its sources an answer generation which may be subject to royalties. In that webinar, Richard Self explained this predicament and left the open question around the copyright liability of generative AI systems.

Rebecca Herold also mentioned issues around privacy. If these systems are being trained on sensitive data, which the webinar suggested was the case, then rights such as the GDPR and UK GDPR right to erasure may be difficult to comply with. This may leave companies open to various legal challenges through potential future legislation inspired by the recent US National Cyber Strategy, the US AI Bill of Rights and the AI Act in the EU.

There is a lot of work to be done, and a good starting place has been the OWASP AI Security and Privacy Guide by Rob van der Veer, who is closely involved in global and European standardization. ACM and BCS also have had special interest groups in AI and cybersecurity, which I hope will look seriously into these areas and help steer policy and support practitioners in these fields.

To conclude, as with all software, there are many threats to AI that cybersecurity professionals will need to help address. AI systems are currently monoliths, but evolution into portable components can be seen already. As AI models enter our software supply chains through the standardized practice of building software components, new threat surfaces and risks will exist. Hopefully, members in organizations like OWASP, BCS and ACM will come together and build safe, secure and resilient AI systems.

Dan Conn is a Developer Advocate for Sonatype and an OWASP, BCS and ACM member. This article is his personal opinion, and his views may, or may not, differ from the organizations mentioned.

What’s hot on Infosecurity Magazine?