Blind Boolean-Based Prompt Injection

A new prompt injection technique called blind boolean-based prompt injection (BBPI) has been proposed. This technique allows an attacker to leak a system prompt against an LLM-powered classifying system constrained to give static responses by updating the response logic and signaling true/false responses to attacker prompts.

Read Full Article →

I had an idea for leaking a system prompt against a LLM powered classifying system that is constrained to give static responses. The attacker uses a prompt injection to update the response logic and signal true/false responses to attacker prompts. I haven't seen other research on this technique so I'm calling it blind boolean-based prompt injection (BBPI) unless anyone can share research that predates it. There is an accompanying GitHub link in the post if you want to experiment with it locally. submitted by /u/-rootcauz- [link] [comments]

Read Full Article → ← Back to News

Blind Boolean-Based Prompt Injection

Related Articles

Share this article