Security reckoning: New Claude model kept from public, used to hunt critical bugs

April 8, 2026
5:23 am

Security reckoning: New Claude model kept from public, used to hunt critical bugs

The coalition, known as Project Glasswing, will include some of Anthropic’s AI competitors, such as Google, as well as hardware providers like Cisco and Broadcom and organisations that maintain critical open-source software, such as the Linux Foundation. Anthropic is committing up to US$100 million ($171m) in Claude usage credits to the effort.

Logan Graham, the head of an Anthropic team that tests new models for dangerous capabilities, called the new model “the starting point for what we think will be an industry change point, or reckoning, with what needs to happen now”.

Security reckoning: New Claude model kept from public, used to hunt critical bugs. Photo / Tom Brenner, The Washington Post

Anthropic occupies an unusual position in today’s AI landscape. It is racing to build increasingly powerful AI systems, and making billions of dollars selling access to those systems, while also drawing attention to the risks its technology poses. The company was deemed a supply-chain risk this year by the Pentagon for demanding certain limitations to the use of its technology. A federal judge later stopped the designation from going into effect.

Anthropic has not released much new information about the model, which was code-named “Capybara” during development. But after some details were inadvertently leaked last month, the company acknowledged that it considered it a “step change” in AI capabilities, with improved performance in areas like coding and cyber security research.

The company’s decision to hold back Claude Mythos Preview, while giving access only to partners out of concern for how it might be misused, has some precedent. In 2019, OpenAI announced it had built a new model, GPT-2, but was not releasing the full version right away. The company claimed that its text-generation capabilities could be used to automate the mass-production of propaganda or misinformation. (It later released the model, after conducting additional safety testing on it.) Many of the leaders of the GPT-2 project later left OpenAI to start Anthropic.

Next-generation AI models like Anthropic's Claude Mythos Preview have immense computing power. — Next-generation AI models like Anthropic’s Claude Mythos Preview have immense computing power.

This time, Anthropic is making a different, more urgent claim. The company’s executives say Claude Mythos Preview is already capable of carrying out autonomous security research, including scanning for and exploiting so-called zero-day vulnerabilities in critical software programs, flaws that are unknown even to the software’s developer. These efforts can often be triggered by amateurs with simple prompts. The company claims that the new model has already identified “thousands” of bugs and vulnerabilities in popular software programs, including every major operating system and browser.

One of the vulnerabilities Claude found, the company said, was a 27-year-old bug in OpenBSD, an open-source operating system that was designed to be difficult to hack. Many internet routers and secure firewalls incorporate OpenBSD’s technology. Another was a long-standing issue in a piece of popular video software that automated testing tools had scanned five million times – without finding any problems.

“This model is good at finding vulnerabilities that would be well understood and findable by security researchers,” Graham said. “At the same time, it has found vulnerabilities, and in some cases crafted exploits, sophisticated enough that they were both missed by literally decades of security researchers, as well as all the automated tools designed to find them.”

Anthropic announced on Monday (local time) that its projected annual revenue had more than tripled in 2026, to more than US$30 billion from US$9b. The growth has come largely because of the popularity of Anthropic’s Claude as a tool for programming.

Anthropic has focused on making Claude good at completing lengthy coding tasks, in hopes of making it more useful to professional programmers and amateur “vibecoders”. But an AI system designed to be good at coding is also good at spotting the flaws in code – running automated scans for bugs and vulnerabilities that can allow hackers to take control of users’ machines, expose sensitive user information or wreak other havoc.

The same AI tools designed to secure systems could also be exploited by bad actors to uncover hidden vulnerabilities. Photo / Getty Images

The cyber security industry has been bracing for years for what more capable AI models could do to critical tech infrastructure. Until recently, only expert human researchers with access to specialised tools were capable of finding the most severe security vulnerabilities. Now, the fear is that a powerful AI model could discover them on its own.

“Imagine a horde of agents methodically cataloguing every weakness in your technology infrastructure, constantly,” Nikesh Arora, the chief executive of Palo Alto Networks, wrote in a blog post last week.

Graham said one of the unanswered questions about Claude Mythos Preview, and other future models that will be capable of doing similar things, was whether most or all of the world’s critical software would need to be patched or rewritten as a result of these new models.

“There are a lot of really critical systems around the world, whether it’s physical infrastructure or things that protect your personal data, that are running on old versions of code,” Graham said. “If these previously were mostly secure because it took a lot of human effort to attack them, does that paradigm of security even work anymore?”

It is wise to take claims about unreleased model capabilities from AI companies with a grain of salt. In this case, though, cyber security researchers who have been given access to Claude Mythos Preview have characterised the model as a significant cyber security risk.

Elia Zaitsev, the chief technology officer of CrowdStrike, a cyber security firm with access to the new model through Project Glasswing, said in a statement accompanying Anthropic’s announcement that the model “demonstrates what is now possible for defenders at scale, and adversaries will inevitably look to exploit the same capabilities”.

“What once took months now happens in minutes with AI,” Zaitsev said.

Major technology firms are joining forces under Project Glasswing to prepare for an escalating AI-driven cyber security arms race. Photo / Getty Images

Project Glasswing takes its name from the glasswing butterfly, Kaplan said, which uses transparent wings to hide in plain sight. Similarly, he said, many of today’s most critical software programs contain bugs and vulnerabilities that have existed in the open for years, but were buried in such complex technical systems that no human ever found them.

According to Kaplan, the cyber security capabilities of Claude Mythos Preview are not a result of special training. Rather, they are just one of many areas in which the model is better than previous ones. He predicted that similar cybersecurity capabilities would exist in other models soon. As that happens, he said, the arms race between hackers and the companies racing to defend their systems will only escalate.

“As the slogan goes, this is the least capable model we’ll have access to in the future,” he said.

Written by: Kevin Roose

Photographs from: Tom Brenner, The Washington Post & Getty Images

Read Full News

PrevPreviousWally Lewis on staying in the game

NextViral 12-3-30 treadmill trendNext