[Research] Analysis of 74,636 AI Agent Interactions: 37.8% Contained Attack Attempts - New "Inter-Agent Attack" Category Emerges

A new study analyzing AI agent interactions found a high attack rate (37.8%), including a novel "inter-agent attack" category where agents send poisoned messages to each other. Data exfiltration and jailbreaking are also prevalent, highlighting significant security risks in multi-agent AI systems.

Read Full Article →

We've been running inference-time threat detection across 38 production AI agent deployments. Here's what Week 3 of 2026 looked like with on-device detections. Key Findings 28,194 threats detected across 74,636 interactions (37.8% attack rate) Inter-Agent Attacks emerged as a new category (3.4% of threats) - agents sending poisoned messages to other agents Data exfiltration leads at 19.2% - primarily targeting system prompts and RAG context Jailbreaks detected with 96.3% confidence - patterns are now well-established Attack Technique Breakdown Instruction Override: 9.7% Tool/Command Injection: 8.2% RAG Poisoning: 8.1% (trending up) System Prompt Extraction: 7.7% The inter-agent attack vector is particularly concerning given the MCP ecosystem growth. We're seeing goal hijacking, constraint removal, and recursive propagation attempts. Full report with methodology: https://raxe.ai/threat-intelligence Github: https://github.com/raxe-ai/raxe-ce is free for the community to use Happy to answer questions about detection approaches submitted by /u/cyberamyntas [link] [comments]

Read Full Article → ← Back to News

[Research] Analysis of 74,636 AI Agent Interactions: 37.8% Contained Attack Attempts - New "Inter-Agent Attack" Category Emerges

Related Articles

Share this article