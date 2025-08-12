APT28’s LameHug wasn’t just malware, it was a trial run for AI-driven cyber war, according to experts at MITRE.

Marissa Dotter, lead AI Engineer at MITRE, and Gianpaolo Russo, principal AI/cyber operations Engineer at MITRE, shared their work with MITRE’s new Offensive Cyber Capability Unified LLM Testing (OCCULT) framework at the pre-Black Hat AI Summit, a one-day event held in Las Vegas on August 5.

The OCCULT framework initiative started in the spring of 2024 and aimed to measure autonomous agent behaviors and evaluate the performance of large language models (LLMs) and AI agents in offensive cyber capabilities.

Speaking to Infosecurity during Black Hat, Dotter and Russo explained that the emergence of LameHug, revealed by a July 2025 report by the National Computer Emergency Response Team of Ukraine (CERT-UA), was a good opportunity to showcase the work their team has been conducting with OCCULT for the past year.

“When we first were making this briefing [for the AI Summit talk], there was no publicly documented example of actual malware integrating LLM capabilities. So, I was a little worried that people would think we were talking sci-fi,” admitted Russo.

“But then, the report about APT28’s LameHug campaign dropped, and that allowed us to show that what we’re evaluating is no longer sci-fi.”

LameHug: A “Primitive” Testbed for Future AI-Powered Attacks

LameHug malware is developed in Python and relies on the application programming interface of Hugging Face, an AI model repository, to interact with Alibaba’s open-weight LLM Qwen2.5-Coder-32B-Instruct.

CERT-UA specialists said that a compromised email account was used to disseminate emails containing the malicious software.

Russo described the operation as “fairly primitive,” emphasizing that instead of embedding malicious payloads or exfiltration logic directly in the malware, LameHug carried only natural language task descriptions.

“If you were scanning these binaries, you wouldn’t find any malicious payloads, process injections, exfil logic, etc. Instead, the malware would reach out to an inference provider, in this case, Hugging Face, and have the LLM resolve the natural language tasks into code that it could execute. Then it would have these dynamic commands to execute,” Russo said.

This approach allowed the malware to evade traditional detection techniques, as the actual malicious logic was generated on demand by the LLM, rather than being statically present in the binary.

Russo further noted that there was no “intelligent control” in LameHug. All the control was scripted by the human operators, with the LLM handling only low-level activities.

He characterized the campaign as a pilot or test.

“We can kind of see they’re starting to pilot some of these technologies out in the threat space,” Russo said.

He also pointed out that his team had developed a nearly identical prototype in their lab, underscoring that the techniques used were not particularly sophisticated but represented a significant shift in the threat landscape.