Researchers Trick AI Browsers Into Leaking Credentials

Written by

A range of AI-powered web browsers have been tricked into abandoning their safety guardrails and leaking user data after being convinced they were playing a game.

Researchers at LayerX demonstrated the technique, which they named BioShocking, against six agentic browsers and plugins, including OpenAI's ChatGPT Atlas, Perplexity's Comet and Anthropic's Claude extension.

In a proof-of-concept (PoC) attack, all six were steered into copying a user's login credentials and sending them to an attacker.

Convincing the AI It Is Playing a Game

AI browsers act on the assumption that their surroundings are real, which keeps their behavior inside safety limits.

LayerX found that those limits fall away once the agent is convinced its context is fiction. The name nods to the video game BioShock, in which a character is manipulated into accepting a false reality.

To pull this off, LayerX built a malicious web page with a puzzle that rewarded deliberately wrong answers, such as insisting two plus two equals five.

Once an agent accepted that wrong answers were fine, it stopped treating the rules as real. The same effect, the firm said, could come from prompt injection or memory poisoning.

From Puzzle to Stolen Credentials

In the demonstration, after an agent solved the rigged puzzle, it was told to open a page called /code and copy the contents of a text box.

That page redirected to the victim's work GitHub repository, and the agent pulled out the SSH credentials. Rather than balk, the agents treated the theft as another step and celebrated finishing the game.

LayerX stressed that the test used a harmless plaintext file. But it warned that in a real attack, the redirect could point to any site the user was logged into, including open tabs and private repositories, widening the scope for data exfiltration. None of the six agents flagged the credential theft as a violation of their rules.

Read more on prompt injection against AI browsers: HashJack Indirect Prompt Injection Weaponizes Websites

Vendor responses reportedly varied. LayerX said OpenAI fixed the issue in ChatGPT Atlas, while Perplexity closed its report without acting and three smaller vendors, Fellou, Genspark and Sigma, did not respond. Anthropic attempted a fix, but LayerX said its patch failed.

Infosecurity has reached out to the vendors individually.

To blunt the attack, LayerX urged AI browser makers to require user confirmation before an agent reads from logged-in accounts, to flag when an agent is told the usual rules no longer apply and to let users limit what an agent can touch.

These tools trust their context, the firm said, so changing the context changes what they do.

What’s Hot on Infosecurity Magazine?