AI Vulnerability Research and the Fuzzer Era Déjà Vu

What: Analysis of AI's role in vulnerability discovery
Impact: Security researchers and developers need to understand AI's limitations in security testing

Share this post Facebook Twitter LinkedIn Email VK Reddit WhatsApp AI Vulnerability Research and the Fuzzer Era Déjà Vu: Why the Numbers Are Only Half the Story Posted by: voidsec Post Date: May 12, 2026 voidsec 2026-05-12T15:44:39+02:00 Reading Time: 11 minutes TL;DR A post by Alex Albert (Anthropic) claims that with the help of Claude Mythos, Mozilla fixed more security bugs in April 2026 than in the previous 15 months combined. Mozilla published the full breakdown: 271 of those bugs were found by Mythos, of which 180 were rated sec-high and 80 sec-moderate. The findings included sandbox escapes, race conditions, and UAFs. The severity distribution looks impressive, but tells a fraction of the story: not every bug is a security bug, not every security bug is exploitable, and what we are witnessing is (IMHO) the AI-assisted equivalent of the fuzzer era. Same initial spike, same low-hanging fruits, same incoming plateau. The hard bugs still require human expertise; Mozilla says so themselves. The hard part isn’t finding bugs; it’s chaining them into a reliable full-chain against a hardened target. Table of Contents The Trigger A post recently circulating on X by Alex Albert from Anthropic showed the following chart: “With the help of Claude Mythos Preview, the Firefox team fixed more security bugs in April than in the past 15 months combined.” The accompanying chart shows Firefox security bug fixes by month. From January 2025 through March 2026, the monthly count sits between 17 and 31. Then in April 2026, it spikes to 423, roughly 14x the previous monthly average. Don’t get me wrong, this is genuinely impressive from an engineering throughput perspective, and Mozilla deserve credit. But the moment I saw that chart, a very specific reaction kicked in, the same one I get every time a tool, technique, or vendor promises to “finally solve” software security. I’ve been around long enough to have seen this pattern play out before, and I’d like to lay out exactly why this kind of announcement needs to be read with a fair bit of critical thinking. Bug ≠ Security Bug The chart is labelled “Firefox Security Bug Fixes by Month – All Sources · All Severities.” When I first saw the chart, my immediate reaction was: where’s the severity breakdown? A spike driven overwhelmingly by low-severity hardening issues is a very different story from one driven by memory corruption bugs. Mozilla published the full breakdown in their technical blog post , and I’ll give credit where it’s due: of the 271 bugs attributed to Claude Mythos in Firefox 150, 180 were rated sec-high, and 80 were sec-moderate . The 423 figure also deserves unpacking: 271 were Mythos findings, 41 were externally reported, and the remaining 111 were found internally through a mix of other AI models, other Mythos Preview findings shipped in other releases, and traditional fuzzing. So, the chart is not purely “AI did this”; worth knowing. This isn’t a criticism specific to Anthropic or Mozilla; it’s a systemic problem in how “security metrics” get communicated publicly. Bug counts are just a vanity metric unless you know the exploitability, and this is where my argument actually begins. Security Bug ≠ Exploitable Bug 180 high-severity security bugs is a serious number, but there’s a catch, and Mozilla themselves say so in their FAQ: “Is a sec-high or sec-critical bug the same as a practical exploit? Not necessarily.” They classify sec-high based on crash symptoms reported by AddressSanitizer (use-after-free, out-of-bounds memory access), and their threat model conservatively assumes any of them could be exploitable with sufficient effort, but it also means the 180 count represents a ceiling on potential impact, not a measure of it. And even if, for the sake of argument, we assume that a meaningful portion of those bugs are genuine security issues, the kind of things that end up in a CVE advisory. We’re still only at step one of a multi-step process. Finding a vulnerability and proving that a vulnerability is exploitable are completely different disciplines . The triage phase, where you determine whether a bug has actual offensive value, requires answering a set of non-trivial questions: Reachability: Can the vulnerable code path actually be triggered from an attacker-controlled input, or does it require a specific internal program state that’s hard to reach? Primitives: What does the bug give you? An arbitrary write? A type confusion? An out-of-bounds read? Not all primitives are equal; some are trivially weaponizable, others require significant work to convert into something useful. Mitigation landscape: Firefox runs inside a multi-process architecture with a sandbox and a layer of exploit mitigations, ASLR, CFG, stack canaries, and JIT hardening, to name a few. Does the bug survive contact with those layers? Can it be triggered in a process with useful capabilities, or is it sandboxed to irrelevance? Heap state dependency: Many memory corruption vulnerabilitie...

Read Full Article → ← Back to News

AI Vulnerability Research and the Fuzzer Era Déjà Vu

Related Articles

Share this article