Artificial Intelligence & Machine Learning , Next-Generation Technologies & Secure Development
WormGPT Clones Persist by Hijacking Mainstream AI Models
Research Shows Attackers Jailbreaking LLMs such as Grok, Mixtral
The phrase "WormGPT," once an actual evil twin of OpenAI's GPT AI model designed for malicious activities, has now become a catch-all phrase for jailbroken large language models used in cybercrime.
See Also: A CISO’s Perspective on Scaling GenAI Securely
Threat actors have adapted mainstream LLMs, such as xAI's Grok and Mistral's Mixtral, to create new "WormGPT" variants capable of generating "uncensored" responses, including to prompts with illegal or unethical intent, said researchers from Cato Networks.
"WormGPT now serves as a recognizable brand for a new class of uncensored LLMs," said Vitaly Simonovich, a Cato researcher. They are not custom-built models, but the result of "threat actors skillfully adapting existing LLMs" through prompt manipulation and, in some cases, fine-tuning on illicit datasets, he said.
WormGPT was originally a single tool, but its leaked code and popularity on cybercrime forums have contributed to its evolution into a broader brand identity (see: WormGPT: How GPT's Evil Twin Could Be Used in BEC Attacks).
New iterations, including xzin0vich-WormGPT and keanu-WormGPT, have appeared on forums such as BreachForums, often advertised with features that emphasize freedom from ethical constraints (see: Hackers Developing Malicious LLMs After WormGPT Falls Flat).
Dave Tyson, CIO at Apollo Information Systems, likens the name's stickiness to genericized trademarks. "Many of them are labeled WormGPT as a means of convenience, just like Americans say Kleenex for a facial tissue," Tyson told Information Security Media Group. Though some variants have distinct names, such as EvilGPT, "most criminal AI are glommed under the word 'WormGPT' as a catch-all term."
These tools are typically deployed through intermediary chat services or platforms, isolating the AI model from end users. "That creates a barrier of isolation between the AI and the actual user; it allows a criminal to provide a service to customers, but behind the scenes use any variety of models to meet the request," he said. Local deployments using tools such as LMStudio or model chaining via services like FlowGPT make it easy for attackers to run and jailbreak uncensored models on demand.
The jailbreaking techniques vary in complexity, from using paraphrased queries to constructing elaborate prompts that mask malicious intent as historical or academic exploration. "Some of the simplest and most observed means to do this is by using a construct of historical research to hide nefarious activity," he said.
The persistence of WormGPT variants reflects a broader trend that LLM guardrails are "not perfect," and are "more like speed bumps" that can slow down, but not stop hackers, said Margaret Cunningham, director of security and AI strategy at Darktrace. She said that the emergence of the "jailbreak-as-a-service" market makes these tools more accessible to non-technical actors, "significantly lower the barrier to entry for threat actors."











