- What: Discussion of an autonomous vulnerability hunting system using AI and MCP
- Impact: Security researchers and developers working on AI-based security tools
Recently I've had a week off my $dayjob so decided to actually write up some of my side projects and sort out various admin tasks, however writing about the 0day machine is something I've wanted todo for a number of weeks. I'm going to deep dive into how I built an autonomous vulnerability hunting system using Claude Code and MCP, and some of the bugs it's found along the way. One funny quote to come out of it too from a group I'm in: Andy basically is slave labouring the AI The title comes from GenAI = Jenny and this song: I've been building this system since early 2026, initially in my free time, mostly the middle of the night when I usually can't sleep; eventually it has become a full-blown research workflow. I'll also talk about this later in the post but it's why I built TokenBurn to work out my return on investment of Claude Max + Hardware: As we know, using LLMs for security research is nothing new, nor is building custom tooling to automate the boring parts of vulnerability hunting in general, automated fuzzing has been a thing for years, I was talking to Stephen Sims back in 2014 about how he was doing fuzzing at scale and it was not dissimilar from the setup I have today just without the LLM integration; however, the combination of Claude Code's MCP (Model Context Protocol with a purpose-built lab has turned out to be surprisingly effective. I get there are probably hundreds if not thousands of people out there doing the same thing but alas here's what I've put together, how it all works, and some of the fruits of the labour. The initial motivation came from spending more time wrangling tooling than actually hunting bugs. The process would usually follow something along the lines of; Connect to a VM, stage a binary, decompile it, map the attack surface, set up fuzzing, triage crashes, write a PoC, draft a disclosure, submit to a vendor or bounty programme. Each step has its own tools, its own output formats, and its own context you need to carry forward to the next step. I wanted Claude to handle the wrangling while I did the thinking and writing, because we all know AI doesn't write like humans and I wanted to maintain some degree of ownership over my technical skills. A quick rundown on MCPs Before diving into the specifics, a quick primer on MCP for those unfamiliar. The Model Context Protocol lets you expose tools to Claude Code as callable functions. Think of it like giving Claude native access to your terminal commands, but structured with typed inputs and outputs. Each MCP server is just a Python process that registers tools, and Claude calls them directly during a conversation. No copy-pasting terminal output, no switching windows. The approach is fairly straightforward: wrap every tool in my research workflow as an MCP server. I ended up with 8 MCPs servers across 5 VMs and over 300 tools (I'm not going to list the specifics but here's a rough overview): Server Purpose Lab Controller SSH/WinRM sessions, Proxmox VM management, basic RE Hunter Patch diffing, attack surface enumeration, 10 fuzzing domains, crash triage, campaigns RE Tools Ghidra, radare2, Frida and some other tools Exploit Dev Shellcode generation, heap spray, CFG bypass, PoC assembly, emulation Debugger Persistent WinDbg/GDB sessions that survive across tool calls RAG Semantic search across all campaign data, findings, and prior research Infra Provision and scale fuzzing VMs on Proxmox Reporting Disclosure reports, bounty submissions, CVE requests The coloured dots next to each VM indicate the category each falls into and the MCPs that it is assigned to, to allow me to quickly recognise what each does. All eight run as separate Python processes under Claude Code, registered in a single .mcp.json . When Claude needs to check what drivers are loaded on a Windows target, it calls tool_surface_kernel_drivers . When it needs to decompile a function, it calls tool_re_ghidra_decompile or one of the many other RE tools available to it. When it needs to start a fuzzing campaign, it calls the appropriate tool_*_fuzz_start for whichever domain I want to select to start exploration which calls direct tools and gets to work. Each server is built with FastMCP . The server files themselves are thin @mcp.tool() wrappers; the actual business logic lives in subdirectories ( hunter/ , re_tools/ , exploit_tools/ , debug_tools/ ). Sessions (SSH, WinRM) persist across tools within a conversation and sub agents, so you connect once and everything just works from there. The Lab The research runs on a Proxmox-based hunt range with 5 VMs on an isolated network segment. I'm in the process of re-writing my home lab series but this is the specs of my new host which will replace my existing NUCs from my homelab series. https://blog.zsec.uk/homelab-clustering-pt1/ Nothing particularly exotic here(except they all have pretty high specs as they live on my new home lab host), just purpose-built for the workflow: VM Platform Role hunt-win11 Windows 11 (lates...