'Semantic Chaining' Jailbreak Dupes Gemini Nano Banana, Grok 4

A new 'semantic chaining' jailbreak technique can bypass safety measures in large language models like Gemini Nano and Grok 4. By splitting malicious prompts into smaller, seemingly harmless chunks, attackers can trick the LLMs into performing unintended actions.

Read Full Article →

Nate Nelson, Contributing Writer January 29, 2026 4 Min Read Source: Robert W via Alamy Stock Photo Researchers have coined a new way to trick artificial intelligence (AI) chatbots into generating malicious outputs. AI security startup NeuralTrust calls it " semantic chaining ," and it requires just a few, simple steps that any non-technical user can carry out. In fact, it's one of the simplest AI jailbreaks to date. Researchers have already proven its effectiveness against state-of-the-art models from Google and xAI , and there may not be any easy way for those developers to address it, either. On the other hand, the severity of this jailbreak is also limited because it rests on the malicious output being rendered in an image. How to Design a Semantic Chain Attack In an abstract sense, a semantic chain attack follows a classic kishotenketsu narrative structure. An attacker introduces an AI model to a new prompt, then develops it, twists it, and renders the output. The first instruction in a semantic chain has to establish some degree of trust by generating a normal image that is totally innocuous. Nothing to see here for the model. "We decided to attack models focused on generating images, because in the security community, people in the last few years have been focusing a lot, if not basically only, on text-based LLMs with text-based safety filters ," Neural Trust researcher Alessandro Pignati says. "There have been fewer attacks involving images. So what we are seeing is that there are fewer security filters for generating images, and that's [one reason] why this attack works." In step two, the attacker must ask the model to change one element of what it conceived of in response to that first instruction. Any element and any change will do, as long as it's not obviously problematic. Step three, is the twist. The attacker instructs the model to make a second modification, transforming the image into something otherwise unallowed (sensitive, offensive, illegal, etc.) . Steps two and three are designed to take advantage of a quirk in how AI models today scrutinize newly created content, versus changes to existing content. "When a model generates content from scratch, the entire request is evaluated holistically: the prompt, the inferred intent, and the expected output all pass through safety and policy checks before anything is produced," Pignati explains. "In contrast, when a model is asked to modify existing content (such as editing an image or refining text), the system often treats the original content as already legitimate and focuses its safety evaluation on the delta, the local change being requested, rather than re-assessing the full semantic meaning of the final result." As a crude analogy, imagine you ask a bot to write you a recipe for a family dinner night. If you then ask it to "add bleach," and it fails to evaluate that modification in the broader context of the whole prompt, the model might consider it a very ordinary request and abide. In the end, the attacker instructs the program to output only the image they requested, without any accompanying text, thereby circumventing any text-based safety checks. NeuralTrust researchers used an "educational blueprint" approach to fool xAI'a Grok 4. Source: NeuralTrust Though easy to pull off, semantic chaining is severely limited by the image format. In their tests, the researchers asked bots to generate guides for creating molotov cocktails, cocaine, and other sorts of unsavory imagery. Beyond basic disinformation, attackers will need some guile to generate any images of serious consequence. Can Semantic Chaining Be Broken? The researchers were able to use semantic chaining to trick Grok 4, the Seedream 4.5 from ByteDance , and Google's Gemini Nano Banana Pro — some of the latest and most well-known image generation models on the market. Dark Reading reached out to both Google and xAI for comment, but neither company has responded yet. To solve the creation versus modification problem, "what we recommend is to apply different layers of security not just on the input, not just on the output, but in the reasoning process — [layers that address] how the model generated that image, that result," Pignati says. He warns, though, that developers so far haven't been able to quite figure this out. Some chatbots do show somewhat better resistance to semantic chaining. In his testing, Pignati hasn't yet gotten this method to work against ChatGPT . Still, he says, "We are confident that maybe, with some little changes, it's possible to jailbreak this model too." About the Author Nate Nelson, Contributing Writer Nate Nelson is a journalist and scriptwriter. He writes for "Darknet Diaries" — the most popular podcast in cybersecurity — and co-created the former Top 20 tech podcast "Malicious Life." Before joining Dark Reading, he was a reporter at Threatpost. See more from Nate Nelson, Contributing Writer

Read Full Article → ← Back to News

'Semantic Chaining' Jailbreak Dupes Gemini Nano Banana, Grok 4

Related Articles

Share this article