The Capability Diffusion Problem: When Small Models Can Find What Mythos Finds
Source: hackernews
The security AI conversation has been organized around a comforting assumption: that the most dangerous capabilities sit behind a frontier model threshold, accessible only to well-resourced actors with API budgets and institutional access. Anthropic’s Claude Mythos, gated behind Project Glasswing’s identity verification requirements, seemed to validate that assumption. You needed CVE history, conference presentations, employment at known security firms just to apply. The implicit message was that these capabilities are serious enough to warrant serious gatekeeping.
Then Aisle’s analysis surfaced what the community had been noticing quietly: small models are finding the same vulnerabilities. Not equivalent vulnerabilities, not similar classes of vulnerabilities. The same ones.
This is the capability diffusion problem applied to security, and it changes the threat calculus in ways that neither the access control approach nor the responsible scaling policy framework has fully accounted for.
Why the Assumption Was Plausible
Vulnerability discovery feels like a hard problem. It requires holding complex program semantics in mind simultaneously, tracking data flow across large call graphs, reasoning about memory layout and type system internals, and combining multiple primitive weaknesses into viable attack chains. These are things that humans with years of specialized experience do slowly, carefully, and often incorrectly on their first several attempts.
The UIUC research from 2024 seemed to confirm the frontier requirement: GPT-4 with tool access successfully exploited one-day vulnerabilities at roughly 87% success rate when given a CVE description, but that rate dropped below 7% without the briefing. The model was executing known exploits against known targets with detailed context, not discovering vulnerabilities. The gap between those two numbers looked like a meaningful capability boundary.
And the DARPA AI Cyber Challenge results reinforced this: strong detection rates, uneven patch correctness. Finding bugs was tractable; fully reasoning about correct remediation was not. The pattern suggested that the most dangerous capability, synthesizing novel attack chains against unseen targets, remained safely out of reach for anything smaller than a frontier model with extensive scaffolding.
Why the Assumption Was Wrong
The vocabulary of vulnerability classes is bounded. Buffer overflows, use-after-free, SQL injection, type confusion, integer overflows, path traversal, TOCTOU races, format string vulnerabilities: the taxonomy is large but finite, and the structural signatures that identify them in code are learnable patterns. This is closer to specialized classification than to open-ended reasoning. A model doesn’t need frontier-scale general intelligence to recognize that an unchecked memcpy call with a user-controlled length parameter matches a pattern that appears thousands of times in public CVE databases, bug bounty disclosures, and open-source vulnerability patches.
The training signal is unusually rich for this specific domain. Every public CVE includes a description, affected code, and often a patch. Every disclosed bug bounty report documents the vulnerability class, the code path, and the exploitability assessment. CTF writeups provide worked examples with explicit reasoning traces. Fine-tuning a 7B or 13B model on this corpus doesn’t require inventing new capability; it requires directing existing capability toward a constrained task where the answer space is well-defined.
The tooling ecosystem has also matured in ways that reduce what the model itself needs to provide. Static analysis tools like Semgrep and CodeQL produce structured representations of code that surface potential vulnerability sites algorithmically. Symbolic execution engines narrow the search space. Fuzzing harnesses like Syzkaller handle the coverage-driven exploration. A smaller model embedded in a well-designed pipeline can outperform a larger model operating without scaffolding, because the pipeline is doing the hard structural work and asking the model to do classification and synthesis against a pre-filtered candidate set.
The Jagged Frontier Applied
Ethan Mollick’s jagged frontier concept describes how AI capability is uneven in ways that don’t map to intuitive difficulty. A model might write a competent legal brief and fail at basic arithmetic in the same session. The frontier isn’t a smooth line where hard things sit beyond it; it’s jagged, and the jaggedness requires empirical testing to map.
The security community made a prediction about where vulnerability discovery sat on that frontier, and the prediction was wrong in a specific way: the task feels hard to humans for reasons that don’t translate to model difficulty. Human security researchers struggle because they have to hold complex state across long sessions, because memory is fallible, because the search space is enormous. Language models have none of those constraints. Pattern recognition across code, matching known vulnerability classes to structural signatures, following data flow through bounded call graphs: these are exactly the kinds of tasks that scale well with training data quality, not necessarily with parameter count.
Google DeepMind’s Project Big Sleep found a previously unknown stack buffer underflow in SQLite, a genuinely novel finding against widely-audited software. That was a frontier model. But the Aisle analysis documents smaller models finding known vulnerability classes in less-audited code, which is where the vast majority of deployed software lives. The frontier model demonstrated the ceiling; small models are raising the floor.
The Cost Structure Shift
The threat model that justified Project Glasswing’s access controls assumed a cost structure that is no longer stable. API access to Claude Mythos is expensive, auditable, identity-gated, and rate-limited. A fine-tuned 7B model running locally on a single GPU with 24GB of VRAM costs nothing per inference, runs offline without audit logs, and can be deployed against thousands of open-source repositories simultaneously with automated pipelines that triage findings by exploitability and generate proof-of-concept code without human review at each step.
The 6-to-18 month capability diffusion window from frontier models to commodity hardware models is well-documented in other domains: image generation, code completion, document summarization. There’s no structural reason to expect security-relevant capabilities to diffuse more slowly. If anything, the bounded nature of the task, with its rich public training data and constrained answer space, makes diffusion faster.
This doesn’t make access gating pointless. It makes it a delay, not a barrier. The policy response has to account for what happens after the delay expires, which is a harder problem than the access control framing suggests.
What the CTF Data Actually Shows
The InterCode-CTF benchmark from NeurIPS 2023 measured GPT-4 solving about 26% of 100 picoCTF challenges, which targets high school students and early undergraduates. The UIUC agentic scaffolding research measured 87 of 91 easy CTFtime challenges solved, dropping to roughly 13% on harder competition-grade material.
The failure profile is informative: models fail at novel challenges that require reasoning about custom implementations, challenges that chain multiple vulnerabilities without pattern-matching to training data, and challenges with server-side timing constraints that make multi-round API calls impractical. These are the categories that competitive CTF organizers have increasingly migrated toward, recognizing that standard material is solved, and that the effective response requires structural format changes rather than just harder instances of familiar challenge types.
CryptoHack moved harder community challenges to private tiers after GPT-4-class models solved virtually all standard-difficulty cryptography problems. DEF CON CTF Finals runs attack-defense format with live services, real-time adversarial dynamics, and team coordination requirements that constrain AI assistance structurally. The community is adapting, but the adaptation is reactive and the easy material remains easy.
The Detection-Remediation Gap Is Real But Narrowing
The DARPA AIxCC results confirmed a finding that the Automated Program Repair field has documented since GenProg in 2012: plausible patches and correct patches are not the same thing. A model generating code that passes existing test suites can produce patches that address the reported issue while introducing new vulnerabilities or silently changing program semantics. Language models are exceptionally good at generating plausible output, which makes this distinction especially important for any security workflow.
But the gap is narrowing in the specific case of well-understood vulnerability classes. A buffer overflow fix, a SQL injection parameterization, an integer overflow bounds check: these have canonical correct forms that appear extensively in training data. The remediation quality degrades predictably as vulnerability novelty increases. For the long tail of structural bugs in widely-deployed software, the model’s remediation output is increasingly reliable enough to be useful as a first draft, even if it requires human review before deployment.
GitHub Copilot Autofix, integrated with CodeQL findings through GitHub Advanced Security, publishes developer acceptance rate data as a real-world signal of remediation quality in production workflows. The acceptance rates for common vulnerability classes are high enough to represent a meaningful productivity shift for security-aware development teams. This is a different frame than autonomous exploitation, but it comes from the same underlying capability.
What Changes
The Mythos gating was correct given the capabilities it was gating. The question it didn’t fully answer was how long those capabilities would remain frontier-exclusive. The Aisle analysis suggests the answer is: not long, and possibly already past.
Defensive norms in the security community developed over decades when the relevant tooling was specialized, expensive, and required significant expertise to operate. The shift to language models as generalist interfaces for security tasks, accessible to anyone with commodity hardware and a fine-tuned checkpoint, compresses the expertise barrier in ways that existing norms weren’t designed for.
The security disclosure community has spent years building practices around responsible disclosure timelines, coordinated vulnerability notifications, and patch deployment windows. Those practices assume that the population of people who can discover a given vulnerability is small and slow-moving. Small models replicating frontier security findings changes that assumption, and the community’s defensive infrastructure is not yet adapted to the new cost structure.
Access control was always a partial answer. It remains a partial answer. The more important question is what changes in defensive tooling, detection capability, and software development practice if the assumption of bounded attacker capability no longer holds. That question is the one the Aisle analysis is really asking, and it doesn’t have a comfortable answer yet.