AI Is Now Finding Vulnerabilities Faster Than Humans Can Patch Them

MH By Mike Housch

March 8, 2026

In January 2026, Anthropic and Mozilla quietly ran a two-week experiment that I believe will be looked back on as a turning point for application security. Using Claude, Anthropic's AI model, the team scanned nearly 6,000 C++ files in the Firefox codebase and surfaced 22 previously unknown vulnerabilities; 14 of them rated high severity. One use-after-free bug was detected in just 20 minutes. For context, those 14 high-severity findings represent almost a fifth of all high-severity Firefox vulnerabilities patched across the entirety of 2025. A two-week AI engagement matched what a year of traditional discovery produced. That is not a marginal improvement. That is a structural shift in the economics of vulnerability research, and every security leader needs to understand what it means for their program.

The Asymmetry That Defines This Moment

Before drawing conclusions, it is worth being precise about what this research actually demonstrated; and what it did not. Anthropic attempted to go beyond discovery and tasked Claude with developing working exploits for the vulnerabilities it found. After hundreds of attempts and approximately $4,000 in API costs, the model succeeded in only two cases; and only within a deliberately weakened test environment with sandboxing protections stripped out.

This asymmetry is the most strategically important data point in the entire study. Finding vulnerabilities is now computationally cheap and highly scalable. Turning those findings into reliable, weaponized exploits remains significantly harder; at least for now. That gap represents a window of opportunity for defenders that we should be exploiting aggressively.

The operative phrase is "for now." The same model capabilities that currently cap out at crude exploit generation will not stand still. The right posture is not complacency; it is to build the defensive infrastructure while the asymmetry still favors us.

What This Changes for Enterprise Security Programs

The Anthropic-Mozilla engagement was not a proof of concept. It was a structured, repeatable operational workflow: a task verifier provided real-time feedback to the model as it explored the codebase, allowing it to iterate results autonomously until findings were validated. That is a production-grade red teaming pipeline, not a research demo. The implications for enterprise programs are direct and significant.

Vulnerability management programs are now under-resourced by design.
If an AI can scan thousands of files and surface high-severity findings in days, the traditional model of quarterly penetration tests and periodic code reviews is no longer adequate. The cadence of discovery has accelerated; the cadence of your remediation program needs to match it. Organizations that have not yet integrated AI-assisted scanning into their application security pipelines are accumulating an exposure gap in real time.
The human role in security is shifting, not shrinking.
Anthropic's researchers validated AI findings in virtualized environments to rule out false positives before disclosure. The human function moved from initial discovery to verification and triage. This is an important distinction for security teams to internalize. The question is not whether AI will replace security engineers; it is whether your team is structured to work at the output speed that AI-assisted discovery now enables. A team still processing findings manually will become the bottleneck.
Third-party and supply chain risk programs are no longer fit for purpose.
Your vendors' codebases contain the same classes of vulnerabilities that Claude found in Firefox. Annual questionnaires and point-in-time assessments are not equipped for a threat environment where an adversary can run AI-assisted reconnaissance against a codebase in days. The bar for what "adequate vendor security assurance" looks like has shifted, and third-party risk programs need to catch up.
Red teaming needs to go continuous and adversarial.
The same techniques Anthropic used defensively; task verifiers, iterative feedback loops, codebase-scale scanning; are available to adversaries. If you are not using AI to test your own systems at the cadence and scale that attackers will use to probe them, you are operating with an asymmetric disadvantage. Continuous AI-assisted red teaming is no longer a premium capability. It is a baseline expectation.

The Supply Chain Signal Most Organizations Are Missing

Firefox is one of the most scrutinized, open-source, security-focused codebases in the world; maintained by a dedicated security team, subjected to continuous fuzzing, and reviewed by some of the best researchers in the industry. And an AI found 14 high-severity bugs in two weeks.

Now think about your software supply chain. Think about the third-party libraries embedded in your applications, the SaaS platforms your business depends on, the vendor SDKs integrated into your customer-facing systems. Most of those codebases have never been subjected to anything approaching the scrutiny Firefox receives. If AI-assisted discovery can surface this volume of findings in a hardened, well-maintained project, the vulnerability density in less scrutinized code is a risk your current third-party risk program almost certainly cannot see.

This is not an argument for panic; it is an argument for precision. AI-assisted software composition analysis and supply chain scanning should be part of how you evaluate vendors and monitor critical dependencies. The tooling exists. The question is whether you are using it.

A Framework for AI-Augmented Vulnerability Management

Based on what the Anthropic-Mozilla research demonstrates and what I am seeing in enterprise security programs, here is how I would structure an AI-augmented vulnerability management capability today.

Integrate AI-assisted SAST into your CI/CD pipeline. Static analysis augmented with AI reasoning; not just pattern matching; should run on every commit. The goal is to surface semantic vulnerability classes (logic errors, memory mismanagement, authentication flaws) that traditional SAST tools miss.
Build a task-verifier architecture for your red team. The key innovation in the Anthropic approach was the real-time feedback loop that let the model iterate. Replicating this in your red team pipeline; whether using commercial AI tools or internal capability; is the highest-leverage investment you can make in offensive security right now.
Restructure human review around AI output velocity. If AI is generating findings at scale, your triage and validation workflow needs to be built for that throughput. This means clear severity thresholds for automated escalation, dedicated human review capacity for AI-flagged findings, and SLAs that reflect the accelerated discovery cadence.
Extend AI-assisted scanning to your vendor portfolio. Work with your third-party risk program to incorporate AI-assisted software composition analysis into vendor onboarding and periodic reviews. Prioritize vendors with access to critical data or operational systems.
Log AI security tooling activity the same way you log production systems. AI-assisted scanning generates its own audit trail that is relevant to your compliance posture, incident response capability, and regulatory obligations. Treat it accordingly.
Update your board-level risk narrative. The story for your board is simple: the cost of finding vulnerabilities has dropped dramatically, which means the expected volume of disclosed vulnerabilities will increase, which means patch velocity and remediation SLAs are now a board-level risk metric; not just an operational one.

The Responsible AI Security Model

What Anthropic and Mozilla did deserves to be acknowledged as the right model for how AI companies and software maintainers should work together. They ran the engagement under a structured research partnership, validated findings before disclosure, coordinated remediation before publication, and published their methodology transparently; including the uncomfortable data point that Claude did succeed in generating exploits in two edge cases.

That transparency matters. It gives the security community accurate signal about current AI capabilities rather than either overstating or understating the risk. It demonstrates that responsible AI security research is possible and productive. And it sets a benchmark for how this class of engagement should be structured as AI-assisted vulnerability research becomes more common.

As an industry, we should be advocating for more partnerships of this kind; between AI developers and critical software maintainers, between security vendors and enterprise customers, between red teams and platform owners. The alternative is that the same capabilities get deployed unilaterally by adversaries who will not be publishing their methodology.

The Strategic Imperative

The security community has been discussing AI-augmented offensive and defensive capabilities in the abstract for years. The Anthropic-Mozilla research closes that abstract gap. We now have a documented, costed, operationally validated example of AI discovering vulnerabilities at a scale and speed that exceeds what traditional programs produce; in one of the world's most scrutinized codebases.

The organizations that respond to this by integrating AI into their vulnerability management, red teaming, and supply chain risk programs now will build a structural advantage. Those that wait will find themselves managing a growing gap between the pace of discovery and the pace of remediation; a gap that adversaries will eventually exploit. The window to build AI-augmented defensive capability while the exploit asymmetry still favors defenders is open. It will not stay that way indefinitely.