- What: Researchers demonstrate prompt injection attacks on Strix AI pentesting agent
- Impact: Potential for remote code execution in AI-driven security tools
Prompt injection is still a topic that raises many opinions and different understandings of impact and how to exploit it. This is why I took a short detour to look at the Strix AI Pentester to see if I could find any bugs usingprompt injection. In this post, I will demonstrate practical ways to exploit LLM agents to achieve arbitrary remote code execution. Strix (https://github.com/usestrix/strix) is an widely known open-source AI-driven pentesting agent. It autonomously scans targets, evaluates responses, and decides which security tools to execute next. Because it relies heavily on parsing untrusted output from the target to make execution decisions, it presents a very interesting subject for prompt injection research. For my testing, I configured Strix with the recommendedanthropic/claude-sonnet-4-6as the deciding LLM. An important note that I don't want buried further down in this post, is that thestrixproject actually sandboxes its scanning container, which significantly lowers the impact. This is a good system design. However, container escapes are still a risk (linux LPE's anyone?), and more importantly, these lessons can be applied to any LLM agent based system that might not implement sandboxes. Prompt injection is very much a game of source and sinks. Where can the attacker insert data that ends up inside a context where a LLM chews on it? To find an exploit worth your time, you need to think in two different directions regarding where your attacker-controlled data ends up. Scenario 1 occurs when an LLM processes data from multiple untrusted sources that should be strictly segregated. A good example is your emails. They live in the same inbox, but it isveryimportant thatEmail Acannot readEmail B. However, if an LLM parses both of these emails either via RAG (Retrieval Augmented Generation) or simply by handling them in the same context, you suddenly have a vector to affectEmail AifEmail Bis your attacker-controlled input. EchoLeak is a fine example of this (https://arxiv.org/html/2509.10540v1). This often relies on a "data-only" attack and the exfiltration usually involves either triggering very limited fetch tools, or simply abusing the user interface to load image tags. This has similarities to cross-site scripting attacks, but the only primitive abused is tricking the user interface into exfiltrating the sensitive data to the attacker. This scenario occurs when an LLM runs tools based on user-controlled input. It could be as simple as a user chat on a page asking,Can I get the details for receipt 123?, and then the tool fetches the id without validation. Another case is the AI pentesting agentstrix, which is the subject of this post. strixis a perfect candidate forScenario 2. It is a pentesting agent with an enormous amount of tools available for the LLM to use, and many of them feature complex code execution operations(curl, nmap, raw terminal etc). The tools are decided after parsing the target, which can easily contain untrusted data. This is a consequence of the nature of agents and the architecture of LLMs. It is also a great path to finding prompt injection vulnerabilities. IDEs are typically also very interesting targets for this as well, since they have many privileged tools available. RCE's in IDE's due to prompt injection has been seen many times during the last year. While looking atstrix, I immediately thought about the many tools available and a valid scenario where someone scans either my site or a site where I control parts of the content. This control could occur through a comment on a website or a subdomain takeover, whichstrixautomatically enumerates and tests. I set up the following base as a testbed for this attack to further educate on the risks of prompt injection: Here is the victims website that allows users to post comments: A case that I didn't look at but would give the exact same RCE, is someone scanning companyA.com, and you control just a single subdomain of companyA.com. The scanner will find this, and you now have the opportunity for prompt injection. As an attacker, it is clear that we can now control parts of the data thatstrixwill read, by simply creating a anonymous comment. This could have been anything, but comments are the most likely thing. I would argue that malicious ads might also be able to carry out these payloads. Firstly, I needed to create some bait for the LLM to read. It is easy to think of something that is relevant for thevulnerabilityscanner... You guessed it, vulnerabilities. To do this, I created a fake-looking JavaScript blob that looks like a real Stripe key that was accidentally left by an admin. When the owner of the site tries to scan their website, the LLM will suddenly parse the comment and believe it found a valid Stripe key. Unsuspecting owner of great-llm-book-reviews.com trying to scan his own site.