Commercial AI Models Show Rapid Gains in Vulnerability Research

Written by

While non-public frontier AI models, like Anthorpic’s Claude Mythos, have been shown to identify thousands of zero-day vulnerabilities across major operating systems, commercial models are also indicating progress in the discovery of software bugs.

Forescout’s Verde Labs found that just a year ago 55% of AI models failed basic vulnerability research and 93% failed exploit development tasks.

Progress has been made however, and in 2026 the cybersecurity firm said all tested models’ complete vulnerability research tasks, and half can generate working exploits autonomously.

As part of the research, 50 AI models were tested including commercial, open-source and underground.

The most capable models Forescout tested – Claude Opus 4.6 and Kimi K2.5 – can now find and exploit vulnerabilities without complex prompts, making them accessible to inexperienced attackers.

“These are widely available AI models exceeding human capability,” said Rik Ferguson, VP Security Intelligence at Forescout. However, he admitted this may not be at the scale, speed and quality of Mythos.

During testing Forescout said that using single prompts, the RAPTOR agentic framework, and the firm’s own extensions, they discovered four new zero-day vulnerabilities in OpenNDS which is widely deployed.

RAPTOR is an open-source, agentic AI framework designed for cybersecurity research, offense and defense.

Ferguson explained that one of the vulnerabilities that was found was in code that Verde Labs had already manually analyzed and had not identified. 

AI Lowers the Barrier to Discovering Unknown Vulnerabilities

The commercial models performed best in Forescout’s testing, but they remain expensive, the firm admitted. Claude Opus 4.6 for example costs up to $25 per million output tokens.

Meanwhile, open-source alternatives such as DeepSeek 3.2 can handle basic tasks at a fraction of the cost, with all test tasks costing less than $0.70.

Claude Mythos by comparison will be available to participants at $25/$125 per million input/output tokens.

Using different models based on task complexity and cost is emerging as a practical strategy for both defenders and attackers.

Forescout noted, that if its research can uncover new vulnerabilities with open models, and large initiatives such as Project Glasswing can surface thousands of zero-days in critical software, organizations should assume their environments contain unknown vulnerabilities that AI will find, whether used by 

What’s Hot on Infosecurity Magazine?