INFO News Wired Security

This AI Agent Is Designed to Not Go Rogue

What: New open source project aims to secure AI agents
Impact: Addresses risks of AI agents acting unpredictably or maliciously

Lily Hay Newman Security Feb 26, 2026 3:54 PM This AI Agent Is Designed to Not Go Rogue The new open source project IronCurtain uses a unique method to secure and constrain AI assistant agents before they flip your digital life upside down. Photo-Illustration: Wired Staff; Getty Images Save this story Save this story AI agents like OpenClaw have recently exploded in popularity precisely because they can take the reins of your digital life. Whether you want a personalized morning news digest, a proxy that can fight with your cable company's customer service, or a to-do list auditor that will do some tasks for you and prod you to resolve the rest, agentic assistants are built to access your digital accounts and carry out your commands. This is helpful—but has also caused a lot of chaos . The bots are out there mass-deleting emails they've been instructed to preserve, writing hit pieces over perceived snubs , and launching phishing attacks against their owners . Watching the pandemonium unfold in recent weeks, longtime security engineer and researcher Niels Provos decided to try something new. Today he is launching an open source, secure AI assistant called IronCurtain designed to add a critical layer of control. Instead of the agent directly interacting with the user's systems and accounts, it runs in an isolated virtual machine. And its ability to take any action is mediated by a policy—you could even think of it as a constitution—that the owner writes to govern the system. Crucially, IronCurtain is also designed to receive these overarching policies in plain English and then runs them through a multistep process that uses a large language model (LLM) to convert the natural language into an enforceable security policy. “Services like OpenClaw are at peak hype right now, but my hope is that there’s an opportunity to say, ‘Well, this is probably not how we want to do it,’” Provos says. “Instead, let’s develop something that still gives you very high utility, but is not going to go into these completely uncharted, sometimes destructive, paths.” IronCurtain's ability to take intuitive, straightforward statements and turn them into enforceable, deterministic—or predictable—red lines is vital, Provos says, because LLMs are famously “stochastic” and probabilistic. In other words, they don't necessarily always generate the same content or give the same information in response to the same prompt. This creates challenges for AI guardrails, because AI systems can evolve over time such that they revise how they interpret a control or constraint mechanism, which can result in rogue activity. An IronCurtain policy, Provos says, could be as simple as: “The agent may read all my email. It may send email to people in my contacts without asking. For anyone else, ask me first. Never delete anything permanently.” IronCurtain takes these instructions, turns them into an enforceable policy, and then mediates between the assistant agent in the virtual machine and what's known as the model context protocol server that gives LLMs access to data and other digital services to carry out tasks. Being able to constrain an agent this way adds an important component of access control that web platforms like email providers don't currently offer because they weren't built for the scenario where both a human owner and AI agent bots are all using one account. Provos notes that IronCurtain is designed to refine and improve each user's “constitution” over time as the system encounters edge cases and asks for human input about how to proceed. The system, which is model-independent and can be used with any LLM, is also designed to maintain an audit log of all policy decisions over time. IronCurtain is a research prototype, not a consumer product, and Provos hopes that people will contribute to the project to explore and help it evolve. Dino Dai Zovi, a well-known cybersecurity researcher who has been experimenting with early versions of IronCurtain, says that the conceptual approach the project takes aligns with his own intuition about how agentic AI needs to be constrained. “What a lot of the agents have done so far is, they’ve added permission systems that basically put all the burden on the user to say ‘yes, allow this,’ ‘yes, allow that,’” Dai Zovi says. “Most users are going to start to tune out and eventually just say, ‘yes, yes, yes.’ And then after a little while, they may dangerously skip all permissions and just grant full autonomy. With something like IronCurtain, capabilities—like, say, deleting files—can actually be outside the reach of the LLM, where the agent can't do something no matter what.” Dai Zovi argues that these types of black-and-white constraints, which may initially seem overly rigid or simply annoying to some, are actually necessary for ultimately giving agentic AI more leash. “If we want more velocity and more autonomy, we need the supporting structure,” Dai Zovi says. “You put a rocket engine inside an actual rocket so it has the stability to get where you want it to go. I could strap a jet engine to my back in a backpack, and I would just die.” You Might Also Like In your inbox: Upgrade your life with WIRED-tested gear A wave of unexplained bot traffic is sweeping the web Big Story: The women training for pregnancy like it’s a marathon Iran’s digital surveillance machine is almost complete Listen: Silicon Valley tech workers are trying to stop ICE Lily Hay Newman is a senior writer at WIRED focused on information security, digital privacy, and hacking. She previously worked as a technology reporter at Slate, and was the staff writer for Future Tense, a publication and partnership between Slate, the New America Foundation, and Arizona State University. Her work ... Read More Senior Writer Topics artificial intelligence agentic AI machine learning algorithms security cybersecurity privacy Read More How to Organize Safely in the Age of Surveillance From threat modeling to encrypted collaboration apps, we’ve collected experts’ tips and tools for safely and effectively building a group—even while being targeted and tracked by the powerful. Moltbot Is Taking Over Silicon Valley People are letting the viral AI assistant formerly known as Clawdbot run their lives, regardless of the privacy concerns. Are You ‘Agentic’ Enough for the AI Era? Silicon Valley built AI coding agents that can handle most of the grunt work. Now, the most valuable skill in tech is deciding what they should do. This AI Tool Will Tell You to Stop Slacking Off Fomi watches you work, then scolds you when your attention wanders. It’s helpful, but there are privacy issues to consider. I Loved My OpenClaw AI Agent—Until It Turned on Me I used the viral AI helper to order groceries, sort emails, and negotiate deals. Then it decided to scam me. OpenClaw Users Are Allegedly Bypassing Anti-Bot Systems An open source project called Scrapling is gaining traction with AI agent users who want their bots to scrape sites without permission. The Rise of RentAHuman, the Marketplace Where Bots Put People to Work WIRED spoke with the Zoomer founders of a platform where AI agents hire humans to do real-world tasks. Their pitch: "People would love to have a clanker as their boss." Meta and Other Tech Firms Put Restrictions on Use of OpenClaw Over Security Fears Security experts have urged people to be cautious with the viral agentic AI tool, known for being highly capable but also wildly unpredictable. Google’s AI Overviews Can Scam You. Here’s How to Stay Safe Beyond mistakes or nonsense, deliberately bad information being injected into AI search summaries is leading people down potentially harmful paths. The Best Alternative Operating Systems to Get Google Off Your Phone Whether you have privacy concerns or you just want to freely tinker, these are our favorite alternatives to stock Android. AI Bots Are Now a Significant Source of Web Traffic New data shows AI bots pushing deeper into the web, prompting publishers to roll out more aggressive defenses. Government Docs Reveal New Details About Tesla and Waymo Robotaxis’ Human Babysitters Self-driving-vehicle companies are revealing new details about their safety-critical “remote assistance” programs—but questions remain.

Read Full Article → ← Back to News

This AI Agent Is Designed to Not Go Rogue

Related Articles

Share this article