Why the Model Is the Least Interesting Part of AI-Powered Security

The Hugging Face team recently published a piece on AI and cybersecurity openness that contains a claim worth slowing down on. They call AI cybersecurity capability “jagged”: the ability to find vulnerabilities doesn’t scale smoothly with model size, and the system architecture surrounding the model drives outcomes more than the model itself. That’s a stronger statement than it first appears, and it has real consequences for how defenders should build and deploy AI tooling.

Most public discourse about AI in security focuses on the model. Will GPT-5 find buffer overflows? Can Claude audit Rust code? The framing treats security capability as primarily a function of what the underlying model knows. But if the jagged capability claim is right, then the model is closer to a commodity component, and the interesting engineering is in what wraps it.

What “Jagged” Actually Means

In the context of the Hugging Face post, “jagged” refers to the observation that a weaker model embedded in a well-designed scaffolding system can outperform a stronger model used naively. The scaffolding includes: which tools the model can call (static analyzers, fuzzers, binary disassemblers, CVE databases), how the model iterates on partial findings, what context it receives about the target system, and where human review sits in the loop.

This maps onto something practitioners already know from classical security tooling. Semgrep at a medium confidence threshold, tuned with custom rules for your specific codebase patterns, catches more real bugs than a general-purpose SAST scanner run at default settings on the same code. CodeQL is effective not because its query engine is uniquely powerful, but because it builds a semantic model of the program and lets you express queries against that model. The engine is almost beside the point; the queries are the artifact.

A similar dynamic applies to fuzzing. OSS-Fuzz has found over 10,000 vulnerabilities in critical open source projects since its launch in 2016, not because it uses novel fuzzing algorithms, but because it wraps libFuzzer and AFL++ with infrastructure for continuous execution, coverage-guided corpus management, and automatic bug deduplication. The scaffolding is the product.

The AI equivalent of this is what the Hugging Face post refers to as “vulnerability scaffolding”: a system that pairs a code-processing model with memory of prior findings, targeted tool invocations, and structured output that feeds into patch generation and verification. They describe a system called Mythos that operates this way. The model is a component; the system is the thing.

Open Architecture, Composable Defense

This is where openness becomes structurally important, and not just as a philosophical preference. If architecture matters more than model, then defenders need the ability to compose architecture. Closed, API-only models constrain composability in ways that open weights do not.

Consider what it means to build a semi-autonomous vulnerability scanner that integrates with your internal code review system. With a closed model accessed via API, you can pass in code and receive text output. You can build scaffolding around that, but the model itself is a black box. You cannot inspect its intermediate representations, add custom tool-calling behaviors at the weight level, or fine-tune it on your organization’s historical vulnerability patterns and the specific idioms in your codebase. You are limited to prompt engineering and output parsing.

With an open model, you can fine-tune on your own secure codebase. You can modify the inference stack to expose intermediate attention patterns that might be useful for understanding which code paths the model considers suspicious. You can run the entire pipeline on-premise, which matters for any organization that isn’t willing to route production source code through an external API. The OpenSSF has been pushing exactly this kind of on-premise, organization-specific hardening as part of its supply chain security work.

The composability argument also applies to the tooling layer. An open agent scaffolding exposes its decision logs. You can audit exactly which tool calls the agent made, what observations it recorded, and why it escalated or deescalated a finding. Closed agent frameworks typically expose summary outputs and little else. For security tooling, where a false negative could mean a shipped vulnerability and a false positive could mean a wasted sprint, auditability is not optional.

The Asymmetry Problem

The capability asymmetry between attackers and defenders isn’t new, but AI sharpens it. Offensive security researchers share techniques through Exploit Database, CVE disclosures, DEF CON and Black Hat talks, and countless blog posts detailing how specific vulnerability classes work. The knowledge diffuses rapidly. A novel technique published on a Friday afternoon is in the toolkit of threat actors by Monday.

Defenders, meanwhile, often operate under confidentiality constraints that prevent them from sharing what they’re seeing. A SOC that develops an effective detection rule for a new attack pattern can’t necessarily publish it. This creates an information environment where the offensive community has better collective knowledge than the defensive community, a situation that open AI tooling partially addresses by at least equalizing access to the underlying capability.

The Open Vulnerability and Assessment Language project, the OSV vulnerability database, and the OSSF’s Scorecard project all operate on the premise that shared vulnerability knowledge is net positive for defenders. Open AI models in security sit in the same tradition. When a research team publishes a model fine-tuned on CVE descriptions and exploit patterns, every defender can use it. When a company trains a proprietary model on the same data and keeps it internal, only that company benefits.

The Autonomy Question

The Hugging Face post advocates for semi-autonomous systems over fully autonomous ones, and there’s a strong case for this that goes beyond caution. The argument against full autonomy in security contexts isn’t primarily about safety theater; it’s about accuracy.

Security decisions are adversarial in a way that most AI tasks are not. An attacker who understands that a fully autonomous system will execute certain actions can design exploits that trigger those actions at useful moments. An AI agent with unrestricted ability to patch code can be manipulated into patching the wrong thing. An agent that can block network traffic autonomously can be triggered into blocking legitimate traffic through crafted inputs that resemble attack patterns.

Semi-autonomous systems with prespecified action sets and human approval requirements on consequential actions are harder to weaponize. The human review step introduces latency, but it also introduces a layer of judgment that is much harder to manipulate through crafted inputs. The design principle here, keeping humans actually in the loop rather than nominally in the loop, means the human reviewer needs to understand what the AI is recommending and why. Open decision logs and auditable traces make this possible; opaque systems make it theoretical.

Practical Architecture

For organizations trying to translate this into concrete decisions, the Hugging Face post recommends a pattern that maps onto what several security-focused teams have landed on independently: keep sensitive data and source code behind organizational firewalls, fine-tune on internal data, and use open scaffolding that produces auditable traces.

In practice, this looks something like running a locally-hosted model (several open models in the 7B-70B range perform reasonably well on code tasks) with a scaffolding layer that manages tool invocations to Semgrep, Bandit, or CodeQL, feeds findings into a structured review queue, and logs every inference step. The model handles pattern recognition and initial triage; humans handle escalation decisions and patch approval.

The alternative, routing production source code through an external model API for security analysis, introduces a dependency on a third-party service for a security-critical function. That is an unusual risk posture for most organizations with meaningful security requirements.

The jagged capability insight from the Hugging Face piece should shift where organizations spend their engineering budget. Less on acquiring access to the largest available model, more on the scaffolding, the tooling integrations, the review workflows, and the fine-tuning pipelines that determine whether the system actually finds things worth finding. The model is replaceable; the architecture is the investment.