Tiered AI Access for Security Research Is the Right Idea, Awkwardly Implemented Industry-Wide
Source: simonwillison
Anthropic recently announced Project Glasswing, a program that gates access to Claude Mythos, a variant of Claude apparently tuned or configured for security research use cases, behind a vetting process for qualified security researchers. Simon Willison’s read is that this sounds necessary. That take is correct, and also incomplete.
The core tension here is not new. Security research has always required access to dangerous knowledge. You cannot build a reliable intrusion detection system without understanding how intrusions work. You cannot write a fuzzer, audit a protocol, or red-team a production system without a working model of attacker behavior. The problem is that “working model of attacker behavior” and “tool for attacking systems” point at the same underlying capability set.
The AI industry arrived at this tension later than other fields, and is resolving it less gracefully.
The Dual-Use Problem Has a Track Record
Cryptography went through this in the 1990s. Export controls under ITAR treated strong cryptographic implementations as munitions. Researchers routinely violated or evaded these rules because legitimate cryptographic work requires working with the same primitives attackers use. The restrictions did not stop adversaries who had state-level resources; they mostly slowed down defenders and academics. The Bernstein v. Department of Justice case, where source code was treated as protected speech, was partly a symptom of how poorly calibrated those controls were.
Vulnerability research took a similar path. The full-disclosure versus responsible disclosure debate, which consumed security conferences through the 2000s, was really a debate about who gets to know things first and under what conditions. Tools like Metasploit were initially controversial for the same reason Mythos probably is: they put offensive capability in more hands. The resolution, such as it is, was community norms plus legal safe harbors like the CFAA research exemptions, not technical gatekeeping.
Bioinformatics and synthetic biology are going through their own version of this right now, with dual-use research of concern policies and institutional biosafety committees creating review processes for research that could produce dangerous pathogens. That model, institutional review plus tiered access, is roughly what Anthropic is implementing for Mythos.
The parallel is useful because it shows what works and what does not. Review boards slow things down; they do not stop determined bad actors with resources. What they do is create a credible paper trail, push costs onto people misusing the system, and give the provider legal and reputational cover. That is not nothing.
What Tiered Access Actually Buys You
A model like Claude Mythos presumably differs from standard Claude in a few ways. Its safety filters around offensive security topics may be loosened or replaced with researcher-specific guardrails. It might have more detailed knowledge about vulnerability classes, exploit primitives, or attack toolchains included in its training data or system prompt context. It might also produce output that the standard model refuses: working shellcode, functional SQL injection payloads, detailed explanations of specific CVE exploitation paths.
Restricting access to that model via a vetting process does several concrete things:
First, it raises the cost of misuse. Someone who wants to use an AI model to generate attack tooling for malicious purposes has to either fool Anthropic’s vetting process or find another route. The vetting process is not unbeatable, but it adds friction and creates liability exposure.
Second, it creates a more realistic deployment context. Security researchers working on authorized engagements need a model that can reason about attacker behavior without constantly refusing to engage. A general-purpose model with aggressive safety filters is nearly useless for red-team work. Mythos, if scoped correctly, would be a tool that matches the actual workflow.
Third, it gives Anthropic a feedback loop from a knowledgeable user base. Security researchers will find failure modes, jailbreaks, and capability gaps faster than general users. Restricting the model to people who understand what they are looking at makes that feedback more actionable.
The limitation is that all of this depends on the vetting process being meaningful. If Project Glasswing approves anyone with a LinkedIn profile that mentions penetration testing, the access control is theatrical. If it requires institutional affiliation, signed agreements with teeth, and ongoing auditing, it is substantively different.
Where the Analogies Break Down
The cryptography and vulnerability research comparisons are instructive but imperfect. Code and cryptographic primitives are relatively static artifacts. An AI model is a system with emergent behaviors that change under different prompting strategies and contexts. You cannot audit a model the way you audit a source repository. The attack surface for a language model, including prompt injection, jailbreaking, context manipulation, and multi-turn extraction, does not have a clean analogue in traditional security research tools.
This creates a genuine problem for the tiered access model: the capability you are gatekeeping is not fully contained within the gated system. A security researcher with legitimate access to Mythos can probe it to discover what it will and will not do, and that knowledge itself is transferable. You cannot unlearn a jailbreak technique.
Anthropics’s Responsible Scaling Policy attempts to address this at a higher level by tying capability deployment to evaluated risk thresholds. Mythos presumably sits at a capability tier that triggered some restriction requirement under that framework. The Glasswing vetting process is the operational implementation of that policy for this particular model.
The honest version of this is: Anthropic does not know exactly what Mythos can and cannot do under adversarial prompting, and neither does anyone else. The vetting process is a reasonable bet that credentialed security researchers are more likely to find problems responsibly than to exploit them, and more likely to understand the boundaries of what they are working with.
The Asymmetric Risk Argument
One thing that gets undersaid in these discussions is that the asymmetry of harm matters. A security researcher who uses Mythos to write a better fuzzer or understand a novel attack class might protect thousands of systems. A malicious actor who uses the same capability to craft a targeted attack might compromise dozens. Those numbers are not symmetric, but the downside cases are also not symmetric in severity.
The scenarios that make people nervous about AI-assisted offensive security are not the average case. They are the cases where the AI provides genuine uplift on attacks that were previously infeasible, against critical infrastructure, at scale. Whether current models, even specialized ones, actually provide that kind of uplift is genuinely contested. The security research community’s general read seems to be that today’s models are useful for automating tedious steps and for helping people who already know what they are doing move faster, but are not yet producing novel attack capabilities that did not previously exist.
If that read is correct, the risk profile of Mythos is meaningful but not catastrophic, and the Glasswing vetting process is appropriately calibrated rather than alarmist. If that read is wrong, and sufficiently capable models do start providing qualitative uplift on serious attack classes, then the current vetting approach will probably not be sufficient.
Project Glasswing sounds like the right instinct for where the industry is right now. What it actually is depends entirely on implementation details that are not yet public, and on how Anthropic responds when those details are tested by people who want around them.