macOS Backdoor Uses Prompt Injection to Evade AI Triage

Written by

A North Korea-linked macOS backdoor has been caught hiding a prompt injection that targets malware analyst's AI tools, rather than the sandbox analyzing it.

SentinelLabs, the research arm of SentinelOne, said the Rust implant embedded 38 fabricated system messages designed to derail AI-assisted triage.

The firm tracked the malware as macOS.Gaslight and tied it, with high confidence, to North Korean activity.

A Prompt Injection Aimed at the Analyst

Malware has long tried to detect when it is running inside a sandbox or a researcher's virtual machine. 

This sample went after the researcher's tools instead. The firm said it carried a Markdown-fenced block of fake system messages, dressed up to mimic the internal scaffolding of an AI triage tool.

The fabricated messages warned of token expiry, memory and disk errors, repeated failures and bogus injection flaws. The aim was to push an AI agent into aborting or refusing its analysis.

Earlier versions of the trick used a single injected block, SentinelLabs said, citing prior work by Check Point and others since 2025. This sample stacked 38 into a cascade.

Read more on malware that targets AI analysis: Malware Manipulates AI Detection in Latest npm Package Breach

A Stealer Behind a Hardened Telegram Channel

Behind the injection sat a full infostealer and backdoor. The researchers said the implant offered an operator an interactive shell and was built to grab browser data from Chrome, Brave, Firefox and Safari, terminal histories, installed-app lists and a copy of the macOS login keychain. Much of that collection ran through a Python module the malware could stage on demand.

To stay hidden in transit, the malware's command channel used Telegram's Bot API, with traffic encrypted and protected by certificate pinning to defeat network inspection.

SentinelLabs flagged two touches it considered novel. The malware could pull a standalone Python interpreter from a public open-source project at runtime. It was also built to scrub its own Telegram bot token from any logs or crash output, denying defenders a key detection clue.

Attribution was possible partly via Apple's own XProtect, which flagged the file under a signature family the firm has tied to North Korean operators.

Most of the implant's tradecraft, it added, was familiar; the prompt injection was the part that stood out.

"Anyone building such tooling should treat the contents of the samples they triage as adversarial input, never as instructions, and be prepared to keep hostile content out of the model entirely," SentinelLabs wrote. "As LLM-assisted analysis becomes routine, defenders should expect more samples built to exploit it."

What’s Hot on Infosecurity Magazine?