Hey everyone, Iāve been working on an experiment in AI-driven application security called SentinAI . Iām a backend engineer in fintech, and I spent part of my recent leave trying to explore a simple question: Most SAST tools are basically metal detectors: theyāre great at catching obvious patterns like unsafe functions or missing headers. But they struggle with the stuff that actually matters in real systems: IDORs authorization drift multi-tenant isolation issues broken middleware assumptions cross-file logic flaws Attackers donāt think in patterns. They think in systems. So I built something experimental to explore that gap. š§ The Architecture (3-Agent Loop) Instead of a single LLM prompt (which tends to hallucinate easily), SentinAI uses a structured multi-agent flow: 1. The Architect Maps the system: routes auth boundaries data flows trust assumptions 2. The Adversary š„· Tries to break it: generates exploit paths builds step-by-step attack chains simulates real-world abuse scenarios 3. The Guardian š”ļø Validates everything: checks exploits against actual code context verifies whether attacks are truly possible filters hallucinated or low-confidence outputs Anything below a confidence threshold (~40%) is dropped. The goal is not to āfind everything.ā Itās to only surface things that are actually exploitable. š” What surprised me A few things stood out while building this: Most real vulnerabilities only appear at interaction points between files, not within a single file LLMs are surprisingly good at generating attack paths, but unreliable without a validation layer The hardest problem wasnāt detection ā it was noise control Without a āGuardianā layer, the system becomes mostly hallucinated security reports very quickly š Privacy / Local-first design Coming from fintech, sending proprietary code to external APIs is not acceptable. So SentinAI is built to run: fully local via Ollama or inside a private VPC with no code leaving the environment š Web3 expansion (experimental) I expanded it beyond Web2 into smart contract security: Solana: missing signer checks, PDA misuse EVM: reentrancy, tx.origin issues Move: resource lifecycle bugs Total coverage: ~45 vulnerability patterns. š§ Open questions (honest part) Iām still actively figuring out: how to reduce hallucinated exploit paths at scale whether multi-agent reasoning actually holds up on large, messy codebases where the boundary is between āuseful security reasoningā and āLLM storytellingā whether this can realistically outperform hybrid static analysis + human review One thing Iāve already noticed: Thatās still an open problem. š§Ŗ Why Iām sharing this This started as a āleave experimentā and somehow got ~200+ organic npm installs without any promotion. I cleaned it up and open-sourced it mainly to: get feedback from people deeper in security engineering understand where this approach fails in real-world systems see if āAI attacker reasoningā is actually useful in practice š If you want to poke at it npm: https://www.npmjs.com/package/sentinai-core GitHub: https://github.com/itxDeeni/SentinAI-Core Curious to hear honest thoughts from people here: Where would this completely break in real codebases? Is multi-agent security reasoning actually useful, or just a fancy abstraction over static + LLM prompts? Has anyone tried something similar in production security pipelines? submitted by /u/itzdeeni [link] [comments]
SentinAI is an open-source, experimental AI security auditing tool designed to find complex, systemic vulnerabilities like IDORs and authorization drift by simulating attacker reasoning, rather than just matching static code patterns. It employs a local-first, multi-agent architecture where an "Architect" maps the system, an "Adversary" generates exploit paths, and a "Guardian" validates findings to reduce hallucinations and noise. The tool supports local execution via Ollama and extends coverage to Web3 smart contracts, focusing on surfacing only high-confidence, exploitable issues.