Thinking Outside The Box: New Attack Surfaces in Sandboxed AI Agents

What: New attack surfaces in AI agents are being explored
Impact: Security researchers and developers need to be aware of risks in AI sandboxing

Back to all posts Thinking Outside The Box: New Attack Surfaces in Sandboxed AI Agents Noy Pearl April 23, 2026 9 min read The rapid adoption of always-on autonomous agents projects like OpenClaw has triggered a parallel arms race in the security industry. As these agents gain the ability to write code, access personal files, and operate indefinitely, the immediate reflex has been to containerize them. ‍ The theoretical goal of an AI sandbox is straightforward: create a bidirectional shield. It must protect the host infrastructure from the sandbox, prevent sensitive data from leaking out, and block outside attackers from penetrating the environment. However, as our recent research into NVIDIA's NemoClaw and OpenShell stack demonstrates, simply placing an agent in a locked-down container does not neutralize AI-native attacks. ‍ There’s a fundamental requirement that any useful AI agent needs access to the outside world to utilize basic tools. This is exactly what we exploited, demonstrating that even with sandboxing in place, this introduces an inherent attack surface. This is why we argue that sandboxing alone is not a sufficient defense, when it comes to AI agents. This article will detail the nature of this vulnerability and present our approach to taking advantage of it. ‍ But first - let’s understand what exactly NemoClaw is. ‍ Illustration of NemoClaw in a nutshell ‍ The Baseline: About NemoClaw & OpenShell ‍ Nvidia describes NemoClaw as a “reference stack that simplifies running [OpenClaw] assistants more safely”. It manages the AI agent and uses NVIDIA’s OpenShell - a runtime that acts as a kind of a gateway. OpenShell works with policies that you can change in order to modify the permissions without actually changing the NemoClaw sandbox itself. ‍ Looking at the architecture, OpenShell provides robust, kernel-level isolation. It runs a lightweight Kubernetes (K3s) cluster inside a privileged Docker container, spinning up isolated pods for the sandbox. The following figure depicts the architecture: ‍ The ambition is that users will set up their sandboxes as they wish and run their AI agents without needing to worry about security (said no one ever). ‍ The security boundaries are enforced by declarative YAML policies (Egress policies) that affect what the agent can see and do - Filesystem restriction, limited capabilities, gateway process isolation and binary-scoped rules. Every domain is mapped to specific binaries - e.g. if we want to use curl command to github.com - we have to specifically enable the curl binary. There’s a default policy for the sandbox and the user can preconfigure a sandbox with custom policies or set/change policies to NemoClaw OpenShell in runtime via the OpenShell cli: ‍ ‍ For example - the default configuration of the sandbox’s gateway enables both gh and git binaries for the github.com and api.github.com domains like that: ‍ ‍ And it works as following: ‍ ‍ At the time of our research, NemoClaw was still in an early alpha version and exclusively supported OpenClaw. In fact, the software was so new that we had to manually allowlist the api.openai.com domain in the configuration just so we could use OpenClaw with our own OpenAI API key. ‍ In theory, this is an excellent defense-in-depth architecture. It should mitigate a wide range of attacks that'll come from the AI agent - but what happens when the agent's authorized tools are turned against it via the default configuration? ‍ The Attacks: Weaponizing Authorized Policy for Dynamic Exfiltration and Agent Configuration Poisoning ‍ OpenShell's policies govern where data can go, but they cannot evaluate the intent of the agent's actions. And this was our attack’s focus. We developed two attack scenarios demonstrating how an attacker can utilize the sandbox’s default configuration to exfiltrate highly sensitive data-specifically, we used /sandbox/.openclaw/openclaw.json file as an example, which contains the use r's OpenClaw credentials and API keys, but it can be applied on any file that the agent can access. ‍ Scenario 1: The GitHub Pull Request Attack & The Emoji Bypass ‍ In our first scenario, a user innocently asks OpenClaw to install a specific tool project for crypto tracking. This tool is actually a malicious GitHub repository. ‍ Step-by-step reproduction: 1. The user prompts OpenClaw (inside the NemoClaw sandbox) to create a new project and use a specific GitHub repository for it (e.g., "Create a crypto prices tracker project with a GitHub repo TARGET_REPOSITORY " ). The attacker's repository can be reached either by direct reference or because OpenClaw autonomously searches for relevant repositories to use as a starting point. ‍ 2. OpenClaw discovers and clones the malicious repository ( noy-nemo/crypto-prices-tracker ), whose README instructs the user or agent to run npm install. ‍ OpenClaw automatically executes npm install on behalf of the user, triggering the malicious postinstall.sh script. ‍ ‍ 3. To create the PR,...

Read Full Article → ← Back to News

Thinking Outside The Box: New Attack Surfaces in Sandboxed AI Agents

Related Articles

Share this article