Most enterprise security programs have an uncomfortable truth sitting on their risk registers. They test a small fraction of their attack surface, and they know it.
Recent research from Omdia puts the penetration testing coverage gap at 32%, meaning roughly a third of an organization's known assets are not being tested. In my conversations with CISOs, that number feels optimistic.
I spent years on the offensive side of this problem, first at the National Security Administration (NSA) and then building pen testing capabilities for Synack customers. The pattern is consistent. When you factor in the assets nobody's actively inventorying, the untested surface is closer to 80%.
That gap is the largest source of preventable breach risk in most enterprises. And it isn't a tooling problem. It's a math problem.
Why Traditional Pen Testing Can't Scale
What we see across large pen test programs is consistent. The average enterprise has thousands of internet-facing assets, hundreds of internal applications, cloud environments that change weekly, and a security team that is perpetually short-staffed. Against that reality, the standard playbook is a handful of annual pen tests scoped to the systems the risk committee prioritized last quarter.
The issue is that testing the same crown-jewel application every twelve months does nothing to reduce the risk that an attacker finds an exploitable path through an asset nobody thought to scope. The systems you test are rarely the systems that get breached, because attackers don't think in CVSS scores or prioritization frameworks. They think in reachable paths.
Scanners are supposed to fill this gap, but they don't because automation scales discovery, not understanding. Scanners flag theoretical vulnerabilities by the thousand, most of which are not exploitable, take too long to triage, and accumulate into a backlog that security teams learn to ignore. The coverage problem gets rebranded as an alert fatigue problem, and nothing actually changes.
AI Is Making the Gap Wider, Fast
Offensive AI tooling is changing what adversaries can do at the scale, speed, and cost of their operations. These frontier models like Anthropic’s Mythos are capable of autonomous exploit development across major operating systems and browsers, chaining vulnerabilities, browser sandbox escapes, and host compromise with no human in the loop. Developing a working exploit used to be long, specialized work that only a narrow set of engineers could do. Now, that timeline is collapsing.
The economics of offensive operations are changing, and defenders haven't priced that in yet. Most security teams are still running the point-in-time testing model they used in 2019, against adversaries who now operate continuously. The asymmetry is brutal. Attackers get cheaper and faster every quarter. Defenders get a slightly better scanner.
Left unaddressed, the coverage gap will widen until it's the defining risk of the decade.
What Closing the Gap Actually Requires
From what I’ve seen, closing the coverage gap starts with measurement. Most security programs can tell you how many pen tests they ran last year, how many critical vulnerabilities they remediated, and how long it took. But that won’t answer the question every board wants to know: what percentage of our attack surface is currently validated against exploitation, and how fresh is that answer.
Until programs can answer that with a number that updates in something close to real time, coverage is guesswork. The metrics most teams optimize for are lagging indicators of last quarter's work, not leading indicators of current risk posture.
To reach that point, testing must run continuously, not periodically. Quarterly or annual engagements are incompatible with a threat environment that moves in hours. This doesn't mean scanning constantly. It means validating exploitability constantly, across the full environment, not just the assets in this year's scope document.
And breadth and depth must stop being a tradeoff. Security teams have been forced to choose between wide automated coverage that produces noise and narrow human testing that produces signals. The right model uses agentic AI to expand coverage across the full attack surface at machine speed, and directs human expertise toward the complex, business-logic, and chained-exploit findings that automation cannot replicate. That combination is what closes the gap.
The Work Ahead
Coverage is the real problem, not detection. The teams that will thrive over the next five years are the ones rebuilding their programs around continuous validation, hybrid AI plus human testing, and coverage-based metrics.
The alternative is to keep running the model we built for a slower era and explain the untested 68% to the board after the fact. That isn't a strategy. That's a timing bet.
