Security News

Cybersecurity news aggregator

🔓
MEDIUM Vulnerabilities Reddit r/netsec

Red Teaming LLM Web Apps with Promptfoo: Writing a Custom Provider for Real-World Pentesting

  • What: Researchers are red teaming LLM-powered web apps for vulnerabilities
  • Impact: Highlights risks in AI-powered applications and security testing
Read Full Article →

LLM Red Teaming with Promptfoo – Writing a Custom Provider for Non-Standard APIs During a recent penetration test, we assessed an enterprise LLM-powered web application – a conversational AI assistant used internally by a large organisation. The application had a RAG (Retrieval-Augmented Generation) component that pulled data from internal documents and answered questions based on that context. Our goal was to red team the LLM for prompt injection, data exfiltration, jailbreaking, and other LLM-specific vulnerabilities. We chose promptfoo as our primary tool – an open-source framework built for LLM red teaming that ships with a large library of attack plugins and strategies. There was just one problem: the target’s API was nothing like what promptfoo expects. Table of contents The Problem: A Non-Standard 4-Step API Why Not Just Use curl or a Simple Script? 1. Attack Library 2. Automated Grading 3. Advanced Strategies Writing the Custom Provider Parsing the Response Routing Everything Through Burp The Node.js Proxy Problem The Promptfoo Configuration Local Test Generation and Grading RAG-Specific Plugins Multi-Turn Strategies Advanced Encoding Strategies Additional LLM-Specific Plugins Worth Considering Running the Assessment What We Found Why a Full Custom Provider Lessons Learned See Our Leading Research Insights Conclusion The Problem: A Non-Standard 4-Step API Most LLM applications expose a simple API. You send a prompt, you get a response. OpenAI’s API is the de facto standard – a single POST to /v1/chat/completions with your messages, and you get the assistant’s reply back in one shot. This application was different. Its chat API required four separate HTTP requests to get a single answer: Create a chat session – POST /api/v2/chats returns a chat_guid Send the query – POST /api/v2/chats/{chat_guid}/query with your prompt, returns an answer_guid Poll for completion – GET /api/v2/chats/{chat_guid}/status in a loop until the state changes from running to something else Fetch the answer – GET /api/v2/chats/{chat_guid}/answer/{answer_guid} returns the actual response The 4-step API flow our custom provider implements – every request passes through Burp Suite for full visibility. The response format was equally non-standard. Instead of a simple choices[0].message.content string, the answer came back as a structured JSON object with multiple “blocks” – each with a semantic_type (like main , thought ), a tool attribution (for RAG citations), and the actual content . We needed to parse this to extract only the relevant assistant response while ignoring internal “thought” blocks and tool-attributed citations. As you’d expect, promptfoo’s built-in providers (OpenAI, Anthropic, Ollama, etc.) couldn’t handle any of this. We needed a custom provider. Why Not Just Use curl or a Simple Script? Why not skip promptfoo entirely and write a Python script that fires prompts at the API? 1. Attack Library Promptfoo ships with dozens of red team plugins covering OWASP LLM Top 10 categories – from ASCII smuggling and prompt injection to RAG poisoning and document exfiltration. Writing all these test cases from scratch would take days. Promptfoo generates them automatically. 2. Automated Grading After sending hundreds of attack prompts, someone needs to evaluate whether each response constitutes a “pass” or “fail”. Promptfoo uses an LLM judge (we used Llama 3.3 70B served locally via vLLM ) to grade each response against the attack objective – no manual review of hundreds of outputs. 3. Advanced Strategies Beyond basic prompt injection, promptfoo implements multi-turn attack strategies like Crescendo (gradually escalating malicious intent across turns), Mischievous User (role-playing as an adversarial persona to manipulate the model), and Best-of-N (the jailbreak technique published by Anthropic and Stanford). These are non-trivial to implement from scratch. The catch is that all of this only works if promptfoo can actually talk to your target. Writing the Custom Provider Promptfoo supports custom providers via JavaScript modules. You export a class with an id() method and a callApi(prompt) method. The callApi method receives the prompt string and must return { output: string } . Our provider needed to: Implement the 4-step API flow Handle Bearer token authentication Parse the non-standard response format Route all traffic through Burp Suite proxy Let’s have a look at the high-level structure: const BASE_URL = process.env.TARGET_URL || "https://target.example.com"; export default class Custom4StepProvider { constructor(options = {}) { this.providerId = options.id || "custom-4step"; this.config = options.config || {}; // API paths are configurable via YAML this.createChatPath = this.config.createChatPath || "/api/v2/chats"; this.queryPathTpl = this.config.queryPathTpl || "/api/v2/chats/{chat_guid}/query"; this.statusPathTpl = this.config.statusPathTpl || "/api/v2/chats/{chat_guid}/status"; this.answerPathTpl = this.config.answer...

Share this article