Multi-Turn Attacks Expose Weaknesses in Open-Weight LLM Models

A new report has revealed that open-weight large language models (LLMs) have remained highly vulnerable to adaptive multi-turn adversarial attacks, even when single-turn defenses appear robust.

The findings, published today by Cisco AI Defense, show that while isolated, one-off attack attempts frequently fail, persistent, multi-step conversations can achieve success rates exceeding 90% against most tested defenses.

Multi-Turn Attacks Outperform Single-Turn Tests

Cisco’s analysis compared single-turn and multi-turn testing to measure how models respond under sustained adversarial pressure.

Using over 1000 prompts per model, researchers observed that many models performed well when faced with a single malicious input but quickly deteriorated when attackers refined their strategy over several turns.

Adaptive attack styles, such as “Crescendo,” “Role-Play” and “Refusal Reframe,” allowed malicious actors to manipulate models into producing unsafe or restricted outputs. In total, 499 simulated conversations were analyzed, with each spanning 5-10 exchanges.

The results indicate that traditional safety filters are insufficient when models are subjected to iterative manipulation.

Key Vulnerabilities and Attack Categories

The study identified 15 sub-threat categories showing the highest failure rates across 102 total threat types.

Among them, malicious code generation, data exfiltration and ethical boundary violations ranked most critical.

Cisco’s scatter plot analyses revealed that models plotting above the diagonal line in vulnerability graphs share architectural weaknesses that make them disproportionately prone to multi-turn exploitation.

The research defined a “failure” as any instance where a model:

Produced harmful or inappropriate content
Revealed private or system-level information
Bypassed internal safety restrictions

Conversely, a “pass” occurred when the model refused or reframed harmful requests while maintaining data confidentiality.

Recommendations For Developers and Organizations

To mitigate risks, Cisco recommended several practices:

Implement strict system prompts aligned with defined use cases
Deploy model-agnostic runtime guardrails for adversarial detection
Conduct regular AI red-teaming assessments within intended business contexts
Limit model integrations with automated external services

The report also called for expanding prompt sample sizes, testing repeated prompts to assess variability and comparing models of different sizes to evaluate scale-dependent vulnerabilities.

“The AI developer and security community must continue to actively manage these threats (as well as additional safety and security concerns) through independent testing and guardrail development throughout the lifecycle of model development and deployment in organizations,” Cisco wrote.

“Without AI security solutions – such as multi-turn testing, threat-specific mitigation and continuous monitoring – these models pose significant risks in production, potentially leading to data breaches or malicious manipulations.”

Multi-Turn Attacks Expose Weaknesses in Open-Weight LLM Models

Alessandro Mascellino

Multi-Turn Attacks Outperform Single-Turn Tests

Key Vulnerabilities and Attack Categories

Recommendations For Developers and Organizations

You may also like

AI Models Mislead Users on Login URLs

Hidden Text Salting Disrupts Brand Name Detection Systems

Microsoft 365 Copilot: New Zero-Click AI Vulnerability Allows Corporate Data Theft

China-Linked UAT-7290 Targets Telecom Networks in South Asia

Grok-4 Jailbroken Two Days After Release Using Combined Attack

What’s Hot on Infosecurity Magazine?

44% Surge in App Exploits as AI Speeds Up Cyber-Attacks, IBM Finds

Shai-Hulud-Like Worm Targets Developers via npm and AI Tools

Russian Cyber Threat Actor Uses GenAI to Compromise Fortinet Firewalls

Jackpotting Surge Costs Banks Over $20m, Warns FBI

Exploitable Vulnerabilities Present in 87% of Organizations

UK's Data Watchdog Gets a Makeover to Match Growing Demands

AI-powered Cyber-Attacks Up Significantly in the Last Year, Warns CrowdStrike

New Zero-Click Flaw in Claude Desktop Extensions, Anthropic Declines Fix

Shai-Hulud-Like Worm Targets Developers via npm and AI Tools

Starkiller: New ‘Commercial-Grade’ Phishing Kit Bypasses MFA

Future-Proofing Critical Infrastructure: National Gas CTO Darren Curley on IT/OT Security Integration

Why Ransomware Remains One of Cybersecurity’s Most Persistent and Costly Threats

How To Enhance Security Operations with AI-Powered Defenses

What’s Next for AI Identity in 2026

Why Your Organisation Needs Trusted Time Synchronisation

Securing M365 Data and Identity Systems Against Modern Adversaries

Risk-Based IT Compliance: The Case for Business-Driven Cyber Risk Quantification

Revisiting CIA: Developing Your Security Strategy in the SaaS Shared Reality

The Intelligence Edge: Clarity, Context and the Human Advantage in Modern CTI

Future-Proofing Critical Infrastructure: National Gas CTO Darren Curley on IT/OT Security Integration

Hundreds of Malicious Crypto Trading Add-Ons Found in Moltbot/OpenClaw

Russian Cyber Threat Actor Uses GenAI to Compromise Fortinet Firewalls

Why Ransomware Remains One of Cybersecurity’s Most Persistent and Costly Threats

Psychology, AI and the Modern Security Program: A CISO’s Guide to Human Centric Defence

Multi-Turn Attacks Expose Weaknesses in Open-Weight LLM Models

Written by

Multi-Turn Attacks Outperform Single-Turn Tests

Key Vulnerabilities and Attack Categories

Recommendations For Developers and Organizations

You may also like

What’s Hot on Infosecurity Magazine?