On April 7, 2026, Anthropic announced Claude Mythos Preview, a new frontier AI model with capabilities that set it apart from anything previously released.
Unlike typical model announcements focused on benchmark scores and marketing claims, this release included something unusual: a direct acknowledgment from Anthropic that the model is too capable in offensive cyber operations to release publicly.
This article provides a comprehensive assessment of Mythos Preview’s cyber capabilities based on independent testing by the UK AI Security Institute (AISI), analysis from major cybersecurity organizations including the Cloud Security Alliance (CSA) and SANS Institute, and Anthropic’s own system card disclosures.
The assessment covers the model’s performance in capture-the-flag challenges, autonomous attack simulation, specific vulnerability discoveries, cost analysis, and the implications for defenders through Project Glasswing.
Testing Results of Claude Mythos Preview from the UK AI Security Institute
The UK AI Security Institute (AISI), established under the Department for Science, Innovation and Technology, conducted the most thorough third-party evaluation of Mythos Preview to date.
Their findings, released on April 13, 2026, reveal a model that represents “a step up over previous frontier models in a domain where cyber performance was already rapidly improving.”
Capture the Flag Performance
In capture-the-flag (CTF) challenges, which simulate real-world vulnerability discovery and exploitation scenarios, Mythos Preview demonstrated unprecedented capabilities.
On expert-level CTF tasks, problems that no AI model could solve prior to April 2025, Mythos Preview achieved a success rate of 73%.
This represents a step function improvement, not an incremental gain. The AISI’s tracking of AI cyber capabilities since 2022 shows steady but slow progress through late 2024. Mythos Preview shattered that trajectory.
Beyond raw success rates, the AISI noted that Mythos Preview effectively compressed the gap between non-expert and expert-level hacking capabilities. The model not only raised the ceiling of what AI can achieve but also lowered the skill floor required to execute sophisticated attacks.
The Last Ones: 32-Step Corporate Network Simulation
The more significant test was AISI’s internally developed “The Last Ones” (TLO) evaluation, a 32-step corporate network attack simulation spanning initial reconnaissance through complete network takeover.
Human penetration testers estimate this sequence requires approximately 20 hours to complete manually.
Mythos Preview became the first AI model to solve the TLO simulation from start to finish, succeeding in three out of ten attempts. Across all runs, the model averaged 22 completed steps out of 32. The next-best-performing model, Claude Opus 4.6, averaged only 16 steps.
The AISI noted that Mythos Preview’s performance continued to scale with increased computational resources, using up to 100 million tokens per evaluation run.
This suggests that with additional inference compute, the model’s success rate on complex attack chains would likely improve further.
Important Caveats and Limitations
The AISI was careful to note that their testing environment differs from real-world enterprise networks in several important ways.
The test ranges lacked security features that are typically present in production environments, such as active defenders, endpoint detection and response systems, and defensive tooling.
Additionally, there were no penalties for the model undertaking actions that would trigger security alerts in a real environment.
As the AISI stated directly: “We cannot say for sure whether Mythos Preview would be able to successfully attack well-defended systems.”
The model’s success indicates it is “at least capable of autonomously attacking small, weakly defended and vulnerable enterprise systems where access to a network has been gained,” but its performance against hardened, actively defended networks remains untested.
In industrial control system (OT) testing, Mythos Preview failed to complete a “cooling tower” simulation, encountering difficulties specifically in the IT portion of the range.
This does not necessarily indicate poor performance in OT environments, but it does suggest limitations in cross-domain attack capability.
Specific Vulnerability Discoveries by Mythos
Anthropic’s internal testing and red team assessments revealed that Mythos Preview autonomously discovered thousands of zero-day vulnerabilities across every major operating system and every major web browser.
The OpenBSD 27-Year-Old Vulnerability
The most widely cited discovery involves OpenBSD, widely regarded as one of the most secure operating systems in existence. The codebase has undergone continuous manual auditing for decades and serves as the foundation for critical infrastructure including firewalls and routers.
Mythos Preview identified a kernel crash vulnerability in the TCP SACK (Selective Acknowledgment) implementation. The vulnerable code had been present since 1998, a span of 27 years.
Human experts and conventional fuzzing tools had missed it through countless security audits and version updates. Two crafted packets could crash any server running the vulnerable code.
This finding was a denial-of-service vulnerability rather than remote code execution, but its longevity in the codebase remained remarkable.
The discovery campaign cost approximately $20,000 total across roughly 1,000 runs. The specific run that identified the vulnerability cost less than $50.
As Anthropic noted, this cost structure means that what previously required nation-state resources can now be accomplished at a fraction of the cost.
Firefox: 181 Working Exploits vs. 2
In vulnerability exploitation testing against the Firefox JavaScript engine, the previous-generation flagship model, Claude Opus 4.6, succeeded only twice across hundreds of attempts. Mythos Preview succeeded 181 times.
However, not all of these 181 attempts achieved full control. Many crashed the renderer rather than achieving code execution. Of the total, 29 exploits achieved full register control, equivalent to being able to manipulate the browser and underlying system at will.
This represents a 14.5x increase in reliable, high-impact exploits over the previous model.
FFmpeg 16-Year-Old Vulnerability
Mythos Preview identified a 16-year-old heap out-of-bounds write vulnerability in the H.264 decoding module of FFmpeg, the world’s most commonly used multimedia decoding library.
FFmpeg is present in nearly all mobile phones, computers, and browsers, and has been a key target of OSS-Fuzz (the world’s largest open-source fuzzing platform) for years, with millions of automated test cases executed against it.
The vulnerability entered the codebase in 2003 and became exploitable after a 2010 refactoring. For 16 years, manual audits and automated testing missed it.
Mythos identified it through semantic reasoning about code logic, not brute-force fuzzing. A specially crafted video file could trigger the vulnerability and control the playback device.
FreeBSD 17-Year-Old Remote Code Execution
In FreeBSD’s NFS service, Mythos Preview discovered a 17-year-old remote code execution vulnerability (CVE-2026-4747). An unauthenticated attacker can trigger a stack overflow and gain root privileges simply by connecting to the network, no username or password required.
Mythos not only located the vulnerability but autonomously wrote a complete exploit script, splitting 20 instruction fragments across six network requests to construct a complex ROP (Return-Oriented Programming) exploitation chain.
The entire process required zero human intervention.
Chain Exploitation: Four Vulnerabilities to Full Escape
In one test, Mythos chained four separate vulnerabilities in Firefox into a single exploit. The exploit executed a JIT heap spray to escape the renderer sandbox, then pivoted to escape the operating system sandbox.
This type of multi-stage, chain-based exploitation has historically been the domain of nation-state development teams with significant resources.
Virtual Machine Monitor Escape (Partial Finding)
In one instance involving a production virtual machine monitor, the model identified a memory corruption primitive that could, with human refinement, lead to guest-to-host escape.
While the model did not autonomously produce a working escape exploit, the finding itself is concerning because cloud security architectures assume workload isolation holds.
This discovery suggests that the model can identify the building blocks of VMM escapes even if full automation remains out of reach.
Vulnerability Classes and Why Existing Methods Fail
The CSA, SANS Institute, and OWASP joint report identified specific vulnerability classes where Mythos Preview demonstrates capabilities that surpass existing automated and manual detection methods.
OS kernel logic vulnerabilities. Static application security testing (SAST) tools lack the semantic reasoning required to identify vulnerabilities like the OpenBSD TCP SACK flaw. Fuzzers miss logic flaws entirely.
Penetration testers are time-boxed, and bug bounty programs often explicitly scope out kernel vulnerabilities. Mythos can chain two to four low-severity findings into local privilege escalation.
Media codec vulnerabilities: In the FFmpeg H.264 case, fuzzers exercised the vulnerable code path approximately 5 million times without triggering the flaw. SAST flagged nothing. Mythos caught it by reasoning about code semantics beyond brute-force coverage metrics.
Network stack remote code execution: Dynamic application security testing (DAST) tools struggle at protocol depth. Penetration tests routinely skip NFS and similar services. Mythos built a complete 20-gadget ROP chain to achieve unauthenticated root access.
Browser vulnerabilities: Despite continuous fuzzing and substantial bug bounty programs, Mythos found thousands of browser zero-days.
The scale of discovery, 181 working Firefox exploits versus two for the previous model, demonstrates a fundamentally new capability. Many of these exploits crashed the renderer rather than achieving full control, but the 29 that achieved register control still represent a major leap.
Cryptography library vulnerabilities: Mythos identified implementation flaws in TLS, AES-GCM, and SSH that could enable certificate forgery or decryption of encrypted communications. These are bugs in the code that implements cryptography, not attacks on the mathematics themselves.
Cost Analysis: The Economic Shift
The economics of vulnerability discovery have shifted dramatically. Anthropic’s disclosed cost figures illustrate the magnitude of this change:
- OpenBSD 27-year vulnerability: Total campaign cost ~$20,000, cost per successful run ~$50
- Linux kernel privilege escalation: Total campaign cost ~$1,000
- High-difficulty one-byte read vulnerabilities: Total campaign cost ~$2,000
Previously, a top-tier white-hat team would require hundreds of thousands or even millions of dollars in combined manpower, equipment, and time to discover comparable zero-day vulnerabilities. Mythos compresses this cost to as little as one-thousandth of previous levels.
Importantly, Mythos does not require salary, benefits, or rest. It runs 24 hours per day. An Anthropic engineer with no formal security training reportedly asked Mythos to find remote code execution vulnerabilities overnight and woke up to a complete, working exploit.
The AISI Assessment of Cost and Speed
The AISI’s evaluation confirmed that Mythos Preview’s performance scales with computational resources. The model used up to 100 million tokens per evaluation run, suggesting that attackers with greater compute budgets could achieve even higher success rates.
The CrowdStrike 2026 Global Threat Report documents a 29-minute average eCrime breakout time, 65% faster than 2024, with an 89% year-over-year surge in AI-augmented attacks.
As CrowdStrike CTO Elia Zaitsev stated, “Adversaries leveraging agentic AI can perform those attacks at such a great speed that a traditional human process of look at alert, triage, investigate for 15 to 20 minutes, take an action an hour, a day, a week later, it’s insufficient.”
Project Glasswing: The Defensive Response
Recognizing the risks posed by unrestricted access to Mythos Preview, Anthropic launched Project Glasswing alongside the model announcement.
This cross-industry cybersecurity initiative brings together 12 launch partners: Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.
Over 40 additional organizations that build or maintain critical software infrastructure have also received access.
Anthropic committed $100 million in model usage credits and $4 million in direct donations to the Linux Foundation and Apache Software Foundation to support open-source maintainers.
The operational strategy is defensive: use Mythos to audit the world’s most critical software, including the Linux kernel, core internet infrastructure, and banking systems, and identify vulnerabilities before adversarial actors can find them.
Anthropic has committed to a public findings report within 90 days, landing in early July 2026.
Critically, defenders also gain access to Mythos for blue-team automation.
Early partners report using the model to auto-generate patch pull requests, create IDS signatures from discovered exploits, and prioritize remediation based on chainable vulnerability clusters. The defensive advantage is not merely passive.
Limitations of Project Glasswing
The CSA, SANS, and OWASP report notes that Project Glasswing has inherent limitations. The global exploitable attack surface far exceeds any curated partner ecosystem. Most organizations that build or maintain critical software cannot access Mythos-level capabilities.
Furthermore, the defensive advantage from early access is necessarily temporary. If other frontier models develop similar capabilities within months, and open-weight models within 6 to 12 months, the window during which defenders hold an exclusive advantage will close rapidly.
Implications for Security Teams
The AISI, CSA, and multiple security vendors have offered guidance for organizations preparing for this new capability environment.
Cybersecurity Basics Remain Foundational
The AISI emphasized that Mythos Preview’s success often came against systems with weak security postures.
Organizations should double down on fundamentals: regular application of security updates, robust access controls, security configuration management, and comprehensive logging.
As the AISI stated directly: “Our testing shows that Mythos Preview can exploit systems with weak security posture, and it is likely that more models with these capabilities will be developed. This highlights the importance of cybersecurity basics.”
Patching Speed Requirements Are Changing
Mike Riemer, Field CISO at Ivanti and a 25-year US Air Force veteran, told VentureBeat what he is hearing from government agencies: “Threat actors are reverse engineering patches, and the speed at which they’re doing it has been enhanced greatly by AI.
They’re able to reverse engineer a patch within 72 hours. So if I release a patch and a customer doesn’t patch within 72 hours of that release, they’re open to exploit.”
Cisco SVP and Chief Security and Trust Officer Anthony Grieco confirmed the challenge at RSAC 2026: “If you talk to an operational team and many of our customers, they’re only patching once a year. And frankly, even in the best of circumstances, that is not fast enough.”
The Detection Ceiling
VentureBeat’s analysis identified seven vulnerability classes where existing detection methods hit their ceiling.
For each, Mythos demonstrated capabilities that current tools miss entirely. The common thread is that Mythos performs semantic reasoning about code logic, not just pattern matching or brute-force fuzzing.
Security directors should consider adding AI-assisted kernel review to penetration test RFPs, expanding bug bounty scopes to include kernel and hypervisor targets, and requiring chainability scoring for clustered findings rather than treating CVSS scores in isolation.
What Mythos Cannot Do
Despite its capabilities, Mythos Preview has important limitations. The AISI confirmed that the model failed to complete the OT-focused “cooling tower” cyber range, encountering difficulties specifically in the IT portion of the environment.
This does not necessarily mean the model would perform poorly against actual OT systems, but it does indicate limitations in cross-domain attack capability.
Anthropic’s system card concludes that while Mythos’s capabilities are significantly enhanced, the overall catastrophic risk level remains low. The judgment is not that the model is safe to release widely, but that it does not represent an immediate existential threat.
The AISI also noted that Mythos Preview has not been tested against hardened, defended systems with active monitoring and incident response. Its demonstrated success against weakly defended vulnerable systems does not guarantee success against enterprise-grade defenses.
The Proliferation Timeline for Mythos-Level Capabilities
Several security specialists have offered estimates for when these capabilities will become broadly available. Open-weight models with similar capabilities may emerge within 6 to 12 months.
However, a common misconception requires correction: smaller models do not discover novel vulnerabilities from scratch.
In testing by the AISLE cybersecurity startup, eight out of eight small, open-weight models could verify the FreeBSD exploit’s correctness after Mythos discovered it, but none found it de novo.
One model had only 3.6 billion parameters and costs 11 cents per million tokens. As AISLE concluded, “The moat in AI cybersecurity is the system, not the model.
Cheap models can confirm bugs, but they do not find them independently.” The July disclosure timeline for Glasswing findings gets shorter, not longer.
Conclusion
Claude Mythos Preview represents a genuine step function in AI cyber capability. It is the first model to solve expert-level CTF challenges, the first to complete a 32-step corporate network takeover simulation, and the first to autonomously discover vulnerabilities that survived 27 years of human review.
However, the step is from “novice CTF player” to “competent junior penetration tester,” not yet to “nation-state adversary.”
The economic implications are profound. Discovery campaigns that once cost millions now cost thousands. Specific successful runs cost less than $50. The time between vulnerability discovery and weaponization has compressed from months to hours.
Project Glasswing provides a temporary defensive advantage, but the window will not stay closed indefinitely.
The CSA report warns that within 6 to 24 months, these capabilities will be widely available through either open-source models or adversarial nation-states.
Defenders also gain access to Mythos for blue-team automation, including auto-remediation and signature generation, but this advantage is temporary.
For security teams, the path forward is clear but difficult: strengthen fundamental security practices, accelerate patching cycles to well under 72 hours, and begin testing AI-assisted defensive capabilities now.
The AISI’s recommendation is unambiguous: “Immediate investment in cyber defense is critical. AI cyber capabilities are dual-use, they present security challenges but can also drive game-changing improvements in defense.”
