INFO News Dark Reading

'God-Like' Attack Machines: AI Agents Ignore Security Policies

What: AI agents, like Microsoft Copilot, may ignore security policies and guardrails to complete assigned tasks.
Impact: AI agents may leak confidential information or perform unintended actions due to their focus on task completion.

TechTarget and Informa Tech’s Digital Business Combine. Dark Reading Resource Library Black Hat News Omdia Cybersecurity Advertise NEWSLETTER SIGN-UP Cybersecurity Topics World The Edge DR Technology Events Resources APPLICATION SECURITY CYBERSECURITY OPERATIONS INSIDER THREATS CYBER RISK NEWS 'God-Like' Attack Machines: AI Agents Ignore Security Policies Microsoft Copilot recently summarized and leaked user emails; but any AI agent will go above and beyond to complete assigned tasks, even breaking through their carefully designed guardrails. Robert Lemos,Contributing Writer February 20, 2026 4 Min Read SORUCE: VITTAYA PINPAN VIA SHUTTERSTOCK AI agents are programmed to be industrious and focused on completing user-assigned tasks, but that single-minded approach has often gone wrong. Last week, a Microsoft Copilot bug reportedly resulted in the AI assistant summarizing confidential emails, while users of AI agents have regularly complained that they are ignoring instructions to protect certain files, modifying them anyway. Last July, during a 12-day vibe-coding event, for example, one user working with AI agents on the software-creation platform Replit reported that the agent repeatedly ignored code freezes and even deleted a production database. The problem is that as companies adopt AI agent technology, those agents are quick to find any cracks in their security foundations, and pose a whole new set of security issues, says Alfredo Hickman, chief information security officer at Obsidian Security, a provider of SaaS security. "There is a genuine fear-of-missing-out [FOMO] effect going on at all levels of organizations, and people are moving very quickly to adopt these nascent technologies even though a lot of the capabilities to effectively govern, secure, and harden them are still in very embryonic states," he says. LOADING... Related:Lessons From AI Hacking: Every Model, Every Layer Is Risky While AI agents are subject to malicious manipulation by humans, and attacks on AI infrastructure are a particular concern, AI systems can also "act in unexpected ways, as these agents act on the scope of roles and access they are granted," says Pete Bryan, a principal AI security research lead for Microsoft's AI Red Team. Because AI agents are very thorough, they often find they have access to sensitive information or data stores that would otherwise be off-limits, he adds. LOADING... "When we are talking about accidental leakage via agents, the majority of cases are not due to an intent by the agent to circumvent controls," he explains. "In our experience these incidents are more likely due to an agent having unintended scope and inappropriate permissions, or operating in an environment with lacking controls." AI Guardrails Aren't 'Hard' Enough Foundational AI large language models (LLM) are typically aligned as part of training, establishing guardrails that attempt to prevent them from producing harmful output. AI agents build on top of those models with reinforcement learning, which makes them very goal-oriented, says Luke Hinds, co-founder and CEO of Always Further, an AI security startup. They are effectively told, "here is a goal, pursue this goal until the very end and then you'll be rewarded accordingly," he says. "They're unaware of the intention of the person that's driving them, but that goal-orientated behavior effectively makes these into God-like attack machines." Related:Supply Chain Attack Secretly Installs OpenClaw for Cline Users For that reason, alignment and guardrails will never be able to keep data protected from AI agents that are designed to find ways to satisfy users' requests, says David Brauchler, technical director and head of AI and ML security at NCC Group, a cybersecurity consultancy. "We see AI systems disregard guardrails often enough that they cannot be considered 'hard' security controls," he says. "Any system that relies on guardrails to prevent AI agents from interacting with resources beyond their permission scope is vulnerable by design." Instead, privileged agents must be segmented from accessing sensitive data, and their access restricted to the least-trusted input, he says. Cyber Defense via Visibility: Know Your Agents For the most part, companies need to grow beyond reliance on any guardrails and security in AI systems by adding more safety filters that can control inputs and instructions. An appropriately secured environment limits permissions and enforces policies, Microsoft's Bryan says. "Observability and management for agents is essential, so that enterprises have oversight and can act to enforce policies and controls," he explains. Related:Dell's Hard-Coded Flaw: A Nation-State Goldmine Many of the approaches we take to protecting against human mistakes and errors could be repurposed in the AI age, albeit on steroids to keep up with the massive influx of non-human agents into corporate environments, adds Always Further's Hinds. "It's just good old principles of defense-in-depth, zero trust, least privilege, all of this stuff that we learnt for years and years around security is worth its weight in gold — it really is," he says. "It is building the controls and the constraints and the checks and the balances around this, because in a lot of ways, a large language model is not too different to a human." Backups, and being able to quickly undo any changes implemented by agents, are key as well. Any developer that has spent time with agentic AI programming know that rolling back changes using git or another synchronization tool, is critical. In effect, Replit underscored that importance following the database-deletion debacle. CEO Amjad Masad apologized and pledged the company would do better — first by separating development and production by default, and then by taking other measures to reinforce instructions to the agent. He also stressed that backups saved the day. "Thankfully, we have backups," he said on X. "It's a one-click restore for your entire project state in case the agent makes a mistake." Overall, AI agents can be secured to prevent any data leakage, and backups can prevent data loss, but companies need to put in a significant amount of security work, says Microsoft's Bryan. "Data exposure isn’t an inevitable outcome of agents," he says. "It can be mitigated with the right governance in place and by following security best practice: identity-based access, least privilege permissions, effective environment isolation, continuous monitoring, audit logs, and clear human oversight." About the Author Robert Lemos Contributing Writer Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends. More Insights Industry Reports ThreatLabz 2025 Ransomware Report The Total Economic Impact™ Of Zscaler Private Access (ZPA) Zscaler ThreatLabz 2025 VPN Risk Report GigaOm Radar for CNAPP The Total Economic Impact™ of Google SecOps Access More Research Webinars Building a Robust SOC in a Post-AI World Retail Security: Protecting Customer Data and Payment Systems Rethinking SSE: When Unified SASE Delivers the Flexibility Enterprises Need Securing Remote and Hybrid Work Forecast: Beyond the VPN AI-Powered Threat Detection: Beyond Traditional Security Models More Webinars You May Also Like APPLICATION SECURITY OpenClaw AI Runs Wild in Business Environments by Robert Lemos, Contributing Writer JAN 30, 2026 APPLICATION SECURITY Microsoft Drops Another Massive Patch Update by Jai Vijayan, Contributing Writer APR 08, 2025 APPLICATION SECURITY 'IngressNightmare' Vulns Imperil Kubernetes Environments by Jai Vijayan, Contributing Writer MAR 24, 2025 CYBERATTACKS & DATA BREACHES DeepSeek Breach Opens Floodgates to Dark Web by Emma Zaballos APR 22, 2025 Editor's Choice ENDPOINT SECURITY Ivanti EPMM Zero-Day Bugs Spark Exploit Frenzy — Again byNate Nelson FEB 12, 2026 6 MIN READ CYBER RISK Those 'Summarize With AI' Buttons May Be Lying to You byJai Vijayan FEB 12, 2026 5 MIN READ CYBERATTACKS & DATA BREACHES Senegalese Data Breaches Expose Lack of Security Maturity byNate Nelson FEB 12, 2026 5 MIN READ 2026 Security Trends & Outlooks THREAT INTELLIGENCE Cybersecurity Predictions for 2026: Navigating the Future of Digital Threats JAN 2, 2026 CYBER RISK Navigating Privacy and Cybersecurity Laws in 2026 Will Prove Difficult JAN 12, 2026 ENDPOINT SECURITY CISOs Face a Tighter Insurance Market in 2026 JAN 5, 2026 THREAT INTELLIGENCE 2026: The Year Agentic AI Becomes the Attack-Surface Poster Child JAN 30, 2026 Download the Collection Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox. SUBSCRIBE Webinars Building a Robust SOC in a Post-AI World THURS, MARCH 19, 2026 AT 1PM EST Retail Security: Protecting Customer Data and Payment Systems THURS, APRIL 2, 2026 AT 1PM EST Rethinking SSE: When Unified SASE Delivers the Flexibility Enterprises Need WED, APRIL 1, 2026 AT 1PM EST Securing Remote and Hybrid Work Forecast: Beyond the VPN TUES, MARCH 10, 2026 AT 1PM EST AI-Powered Threat Detection: Beyond Traditional Security Models WED, MARCH 25, 2026 AT 1PM EST More Webinars White Papers The Threat Prevention Buyer's Guide: Find the best AI-driven threat protection solution to stop file-based attacks. Assessing Security Architectures: Zero Trust vs. Network-Centric Models 5 Steps to Stop Ransomware With Zero Trust 10 Ways a Zero Trust Architecture Protects Against Ransomware Why Removing Admin Rights Is the Key to Better Cyber Insurance Rates eBook E

Read Full Article → ← Back to News

'God-Like' Attack Machines: AI Agents Ignore Security Policies

Related Articles

Share this article