- What: AI is increasingly being used to augment and even replace human penetration testers.
- Impact: AI tools are improving but still require human oversight to avoid false positives and discover complex vulnerabilities.
Robert Lemos, Contributing Writer February 3, 2026 6 Min Read Source: Robert Lemos, using data from Stanford University/CMU research While current AI agents and large language models (LLMs) continue to have significant issues in finding vulnerabilities and conducting penetration tests, they are already augmenting many human pen testers and even supplanting them. Problems such as false positives continue to be significant, and human ingenuity and creativity will continue to be essential for discovering novel or complex vulnerabilities, such as timing attacks, says experts. However, AI pentesting tools and services are quickly improving with the majority of penetration testers already augmenting their workflow with AI technologies — a use case that will only increase. And, the technology continues to get better, says David Brumley, chief AI and science officer at Bugcrowd, a crowdsourced cybersecurity platform. "AI is inevitable, and it was just a question of technically when will we see it, and we've hit that point," he says. "AI is in everything, and pentesting is one of them. We're big believers in this technology to help really just decrease the time that people are vulnerable and increase the pace and scale at which we can do these sorts of things." The move toward agentic penetration testing seems inexorable. XBow, an AI-powered pen-testing and vulnerability-finding service, topped HackerOne's bug bounty leaderboard last summer and remains at the top of the current 'Collectives' leaderboard . In addition, 67% of hackers on the HackerOne offensive-security platform use AI to augment their work, according to the company. An academic research paper published in December by researchers at Stanford University and Carnegie Mellon University found that AI agents can perform as well as the strongest pen testers on a variety of tasks and for a fraction of the price. The research team's AI penetration-testing system ARTEMIS performed better than all but one researcher and cost about $18 per hour to run, while typical penetration testers cost at least $60 per hour, the researchers stated. Three Years to the AI Future? Yet, the future is not here yet. Off-the-shelf LLMs are not very good at finding vulnerabilities, especially open-source systems and those designed by gray-market developers, according to a study by cybersecurity firm Forescout . Commercial systems and cybersecurity-specific systems have much improved, however. In the next 9 to 18 months, AI-powered penetration testing systems capable of completing standards scans for compliance, such as searching for the OWASP Top-25 list of vulnerabilities or to satisfy Payment Card Industry's Data Security Standard (PCI-DSS), are completely feasiable, says Gunter Ollmann, chief technology officer of penetration-testing-as-a-service (PTaaS) firm Cobalt. For the near team, Ollmann sees work ramping up for the company's approximately 500 penetration testers, but eventually, those cybersecurity workers who are not at the top of their game will likely fall behind. "Honestly, in the next 18 months to three years, I expect that half the pen testers around the world will not be able to keep up with their AI counterparts in this space," he says. Instead, much as in other industries, the future will likely focus on the role of human penetration testers as AI-augmented orchestrators: Humans set intent and scope, validate outcomes, and handle what AI cannot reliably reason through, says Jay Bavisi, founder and president of EC-Council, a cybersecurity-training organization. "If someone’s work is limited to repeatable scanning and basic checks, AI will replace much of that," he says. "If their value is in providing systems thinking, business logic, attacker intent, validation, and prioritization, they remain essential." AI Weaknesses Showing The benefits of AI include speed, better coverage of enterprise infrastructure, and consistency in results. Yet, current agentic AI systems tend to do best on pattern-matching type issues, such as finding cross-site scripting (XSS) vulnerabilities, where humans can apply business logic and context around results, according to an analysis penned by XBow . Two-thirds of penetration testers (67%) are using AI in some way in their workflow. Source: Robert Lemos, based on HackerOne data In its own analysis, crowdsourced offensive-security service HackerOne found that AI penetration-testing tools have difficulty with tasks involving graphic user interfaces and demonstrate higher false positives, otherwise known as "hallucinations," concluding that the current crop of tools work for a wide, but shallow, set of engagements. The AI systems offer significant benefits including systematic coverage of the entire enterprise, more frequent testing frequencies, and better results in tasks that require methodical enumeration, such as finding infrastructure-level vulnerabilities, the company reported . The result is that AI can find a plethora of potential issues, but that human penetration testers need to ride herd to reinforce accuracy and provide trust in the results, says Michiel Prins, co-founder and senior director of product management at HackerOne. "As attackers use AI to scale, defenders need human-AI collaboration to keep up," he says. "We’re seeing that play out across enterprise, cloud, and AI system testing." Moreover, AI does not provide any sort of legal guarantees, so human experts will have to validate and stand behind the results of their tools. Companies conducting penetration tests still have to define and enforce issues of authorization, scope, and intent, says EC-Council's Bavisi. "If an AI system crosses a line, accountability sits with the organization and operators, not the model," he says. "That is why clear rules, strong logging, and explicit human ownership are critical, especially as tools become more autonomous." AI Work, Human Responsibility Currently, penetration testing firms are establishing ways to allow the use of AI with guarantees around the protection of customers data and ensuring that the AI systems do not take malicious actions. Bugcrowd's Brumley points out that if an AI system found a new drug, human doctors and scientists would have to validate the medicine and ensure that it was safe. "I think there's always going to be a place for a human," he says. "Even if AI could do this, you can't really hold it accountable or responsible, and so it still feels like you need someone that you can say, 'I'm going to take an action based on this.'" HackerOne's Prins agrees that humans will have to shoulder the responsibility for their AI tools, so penetration-testing — and other AI-augmented services — may always be a combined effort. "The future of pentesting isn’t full self-driving—it’s hybrid," he says. "That shift is driving broader adoption of agentic pentesting as a service, ensuring every finding reflects real, exploitable risk that security teams can trust." Another place were humans will be necessary? As training guides to teach new approaches to AI. While AI is making significant strides in penetration testing and vulnerability finding at present, eventually the engineering teams behind training the models will run out of training data, says Cobalt's Ollmann. "In the future, ... elite pen testers with deep skills [will be needed] to augment those AI pen testing technologies and to provide additional insight and feedback loop into the pen testing proess, so that the AIs can continue to learn and continue to adapt," he says, adding that "it becomes less about the 500 or 1,000 pen testers and more about access to the world's top 1% or 2% of pen testers, which then provide that layer above and beyond what the AI tools can deliver." About the Author Robert Lemos, Contributing Writer Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends. See more from Robert Lemos, Contributing Writer