Project Glasswing Proves AI Can Find Bugs. Who Will Prepare?

0 1 6 minutes read

Project Glasswing Proves AI Can Find Bugs. Who Will Prepare?

Last week, Anthropic announced Project Glasswing, an AI model so successful at detecting software vulnerabilities that they took the dramatic step of postponing its public release. Instead, the company gave access to Apple, Microsoft, Google, Amazon, and a coalition of others find and fix bugs before enemies can.

Mythos Preview, the model that led to Project Glasswing, found vulnerabilities in all major operating systems and browsers. Some of these bugs have survived decades of human research, empirical obfuscation, and open source testing. One had been sitting for 27 years on OpenBSD, which is generally considered one of the most secure operating systems in the world.

It’s tempting to put this under “AI lab says their AI is very dangerous,” the same playbook OpenAI has worked with GPT-2.

Not so fast; there is a material difference in this.

Legends didn’t just find individual CVEs.

It bound four independent bugs in a sequence of exploits that bypassed both browser renderer and OS sandboxing
Perform privilege escalation in Linux with race conditions
Build a ROP chain of 20 gadgets targeting FreeBSD’s NFS server, distributed across all packages.

Claude Opus 4.6, the previous model of the Anthropic frontier, failed in the development of independent exploits almost completely.Mythos achieved a success rate of 72.4% in the Firefox JS shell.

This is not a theory, nor is it a new three to five year forecast. This is about to become a reality in real-world engineering.

Why Project Glasswing Reveals a Real Cybersecurity Gap

Here’s a number that should keep security leaders up at night: less than 1% of Mythos-detected vulnerabilities have been eliminated.

Let that sink in for a while.

The most powerful vulnerability detection engine ever built went up against the most important software in the world, and the ecosystem couldn’t absorb the output.

Glasswing solved the detection problem.

No one has solved the repair problem.

Why Defenders Can’t Keep Up: Calendar Speed vs. Machine Speed

This is a structural issue that the cyber security industry has been tossing around for years. AI just made it impossible to ignore.

Defenders are active calendar speed. See:

Gather wisdom
Create a campaign
Simulate threats
Reduce
Repeat

That cycle takes approx four days on a good day. Attackers, especially those who now use LLMs in all phases of their operations, are moving at the speed of a machine.

For more information, David B. Cross, CISO at Atlassian, will be speaking Automated Verification Conference on May 12 about what this looks like internally, why periodic checks can’t keep up with autonomous adversaries, and what defenders should be doing instead.

AI-Powered Attacks Are Already Automated

Earlier this year, a threatening actor was planted a custom MCP server hosting LLM as part of their attack chain against FortiGate devices.

AI was in charge of everything:

Automatic backdoor creation
The internal infrastructure map is given directly to the model
Automated vulnerability testing, as well
Prioritizing AI for domain administrator access attack tools.

The result? 2,516 organizations in 106 countries were compromised in parallel. The entire chain, from initial access to evidence disposal to data analysis, was self-regulated. The only human involvement was to review the results later.

AI-based Vulnerability Discovery Is Outpacing Remediation

The gap between the speed of the attacker and the speed of the defender is not new.

What’s new is that a small but worrying gap has just become a chasm.

Independent systems such as AISLE found 13 of the 14 OpenSSL CVEs in recent planned releases, bugs that had survived years of human review.
XBOW became the top hacker in HackerOne 2025, surpassing all participants.
The average time from disclosure to weapon exploitation has decreased from 771 days in 2018 to single-digit hours in 2024.
By 2025, most exploits will be weaponized before to be exposed.

Now add the Mythos-class discovery to this picture.

You don’t automatically find a safe world. You get a a tsunami of legal adoption that still requires human verificationorganizational processes, business continuity considerations, and revision cycles that have not fundamentally changed in ten years.

How to Build a Fantasy-Ready Security System

The instinct behind Glasswing is to ask: “How do we get more bugs?”

That’s the wrong question.

The correct one is: “When thousands of exploitable vulnerabilities are sitting on your desk tomorrow morning, can your system really process itself?“

For most organizations, the honest answer is no. And the reason is not lack of tools or talent; it is a structure dependence from time to time, processes initiated by humans which were designed for the country where the danger came in, not the one where the tsunami arrived.

We cannot fix all vulnerabilities. We can’t use every fitness option.

That is not defeatthat is the pragmatic starting point for any truly effective security system. The important question is not “is this CVE serious?” but”Is this vulnerability usable in my current environment, given what I’ve used?“

A Mythos-ready security system requires three key components.

First: Signal-Driven Verification Over Scheduled Inspections

When a new threat emerges, when assets change, or when configurations change, defenses are needed checked against that particular change at that time. Not during the next pentest of the quarter. Not when someone can find an open calendar slot.

The whole idea of ”programmed validation” is taking a stable threat position, and today, that is the thought is dead on arrival.

Second: Certain Environmental Content Above Average CVSS Scores

Glasswing will generate an avalanche of CVEs.

However, many risk management systems are still prioritized by CVSS scores. This context-free metric tells you how bad the bug is it could be in theorynot that applicable to your specific infrastructuregiven your controls and business risk.

When the volume of the sudden discovery takes off hundreds to thousandsprioritizing without context will only slow you down; it will completely break your process.

Third: Closed Loop Adjustment Without Handover

The current model cannot survive in a world where adversaries exploit CVEs within hours of exposure. You know the drill:

The scanner detects an error
The analyst evaluates it
The ticket goes to a different party
Someone pulls it out after weeks
No one reconfirms

That handoff chain is where the system falls apart. If the cycle from receipt to repair to revalidation can’t work without people issuing tickets in line, it obviously doesn’t work anywhere near the speed of the machine.

This is not about buying more tools. It’s about defenders using their strengths one asymmetric advantage: you know the topology of your organization, attackers do not.

That is a great advantage, but only if you can do something at machine speed.

How Auto Exposure Verification Closes the Gap — And That’s Where Picus Comes In

This is the part where I’m going to be really open about who’s writing this.

At Picus Security, we build the platform Verification of Autonomous Exposure. So, full disclosure, I have an opinion here that comes with an inherent bias. Take it accordingly.

What Glasswing highlighted to us, and to many CISOs we spoke to, is that confirmation step within any exposure management plan it turned out to be a very serious bottle.

Vulnerability detection has become much easier and more efficient
Patching them will always be painfully slow.

The only lever you can pull in knowing which ones are important in your place. That’s confirmation.

From Four Days to Three Minutes: How Agentic Workflows Are Changing the Cycle

We built Picus Swarm, an AI team that enables autonomous, real-time verification, compressing a typical four-day cycle into minutes.

It’s a set of AI agents that work together to do what used to be handed over manually between four different teams:

A agent of the researcher food and vets threatening intelligence.
A a red team agent it maps against your environment to generate a security-tested attacker’s playbook.
A casting agent it runs across all your endpoints and clouds, collecting telemetry and evidence data.
A a liaison agent bridge discovery to remediation, open tickets, start SOAR playbooks, attack indexes in your EDR, and revalidate after global remediation.

All actions are traceable and auditable, and all agents operate within the routes you define.

The entire chain, from a new CISA alert to a confirmed, remedial finding, runs in about three minutes.

When a The Mythos-class model drops thousands of objects found In your organization, you need something that can tell you quickly which of these can be used in your area. What controls can hold, what can fail, and what is the vendor-specific fix?

An Uncomfortable Truth

Project Glasswing will be measured by one metric: how many vulnerabilities are exposed before they are exploited. Not how many have been discovered, not how impressive the exploit chains are, but whether the ecosystem can digest what the AI will produce.

Visibility alone is never enough, 83% of cybersecurity programs still show measurable results. What changes the equation bridging the gap between perception and verification: to know if it can be dangerous it can actually damage your site.

That’s confirmation.

And in a post-Glasswing world, it’s the only thing standing between the flood of discovery and the flood of breach.

We’re hosting an Automated Verification Conference on May 12 & 14 with Frost & Sullivan, featuring doctors from Kraft Heinz and Glow Financial Services, and our CTO, Volkan Erturk. Together, we will go deeper into this particular problem.

>> Register here.

Note: This article was written by Sıla Özeren Hacıoğlu, Security Research Engineer at Picus Security.

Did you find this article interesting? This article is a contributed piece from one of our valued partners. Follow us on Google News, Twitter and LinkedIn to read exclusive content we post.