A large number of false negatives has significantly eroded confidence in automated AI testing for vulnerabilities, a new study from Cobalt has found.
The Cobalt State of Pentesting Report 2026 is based on two comparative surveys in 2025 and 2026 of around 450 cybersecurity professionals.
It found that the percentage of organizations relying entirely on AI automation for testing sank from 29% to 9% over the period, with nearly half (47%) of respondents now preferring a hybrid testing model.
Over three-quarters (78%) said fully automated scanning tools missed critical vulnerabilities.
Read more on pen testing: AWS Unveils ‘Continuum,’ an AI-Powered Vulnerability Management Platform
The share of organizations now preferring a hybrid model, where humans support AI testing, surged 22 percentage points in a year. The percentage of organizations using automation for low-risk environments also rose 22 points to 47%.
“While the industry is rightfully excited about the potential of Mythos-class tools, unguided algorithms are inherently prone to returning even more false positives and costly false negatives than the automated scanners we have today,” said Andrew Obadiaru, CISO of Cobalt.
The AI Attack Surface Expands
A big reason for the decline in trust for AI automation is the complexity of the AI attack surface that these scanners are testing, noted the report.
Nearly one-in-three findings from an AI pentest is rated high risk – 2.7 times the average of conventional software, it claimed.
At the time of analysis, less than two-fifths (38%) of LLM vulnerabilities had been fixed, while 62% remained open – the lowest resolution rate of any asset class.
Mean time to resolve (MTTR) for AI/LLM security issues rose from 19 days to 36 days over the period, which Cobalt claimed shows that teams are tracking “significantly harder vulnerabilities” than before.
“LLM vulnerabilities are deeply context-dependent and invisible to tools that lack an architectural understanding of the application,” continued Obadiaru. “To close the validation gap, automation should be deployed exactly where it excels, but elite human expertise remains foundational to uncovering and remediating the most complex business logic risks.”
Of the organizations experiencing AI-related incidents, shadow AI (44%) was most common, followed by data or model poisoning (41%) and improper output handling (41%). Supply chain vulnerabilities (35%) and prompt injection (34%) rounded out the top five vectors.
Although 60% of security professionals said they need stronger LLM testing capabilities, only 42% plan to increase human-led red team operations.

Leave A Comment