· 6 min read ·

Security Knowledge Ages, and SAST Rules Age With It

Source: openai

Every rule in a SAST tool’s rule set was written by someone. That fact is easy to overlook when you install Semgrep, run it against a codebase, and watch it flag issues. The output looks authoritative. What it represents is a collection of security research encoded by specific engineers at specific points in time, organized into rule files, and shipped to you as a coverage claim.

OpenAI’s explanation of why Codex Security doesn’t produce a SAST report focuses on precision: AI-driven constraint reasoning validates exploitability before reporting a finding, which reduces false positives. That precision argument is well-established. A different angle deserves attention: SAST tools have a knowledge freshness problem that operates independently of their false positive rate, and it affects coverage in ways that are largely invisible to the teams using the tools.

How SAST Rules Get Written

A SAST rule encodes a researcher’s understanding of a vulnerability class, translated into a query against a code representation. Writing a Semgrep rule for a Flask route that passes user input to a subprocess call requires knowing which Flask objects carry user-controlled data, which subprocess functions are dangerous, and what sanitization looks like in that context. That knowledge comes from reading framework documentation, studying CVEs in similar applications, and understanding the underlying runtime behavior.

Semgrep’s community registry as of early 2026 contains thousands of rules across dozens of languages and frameworks. Some were written by security researchers at Semgrep. Many were contributed by external researchers, security teams at companies, and tool vendors. Each rule represents a real investment: understanding the vulnerability class, understanding the framework, writing the query, testing against known-vulnerable and known-safe samples.

The quality varies significantly. Rules written by the core team for well-studied patterns in popular frameworks tend to be precise and well-tested. Rules contributed for niche frameworks, less common language idioms, or emerging vulnerability classes may be less rigorously tested, may have higher false positive rates, or may miss variant cases the author didn’t consider. The coverage claim that comes with a named rule set is an average over a distribution of rule quality that you generally cannot inspect without reading each rule individually.

The Freshness Problem

Rules are written at a point in time. Frameworks evolve.

Consider what happened when Django added raw SQL support in its ORM’s annotate() method. Before that addition, most Django SQL injection rules focused on raw() and extra(). After it, annotate expressions with user-controlled values became a potential injection vector. Any SAST rule set whose Django injection coverage predated that API change had a gap that would persist until someone wrote a new rule and it shipped in a tool update.

SQLAlchemy’s text() function, FastAPI’s parameter handling, Starlette’s routing internals: each framework version adds or changes APIs that affect where dangerous patterns can appear. A rule that correctly covers Flask 2.x may miss Flask 3.x changes to request handling. A rule written for aiohttp 3.x may not apply cleanly to a later release with a modified streaming interface.

The lag is structural. A new framework API becomes exploitable in practice, someone discovers the vulnerability class in the wild or in a CVE write-up, a researcher writes a SAST rule, the rule is reviewed, merged, and shipped in the next tool release. By the time the rule reaches most developers, the window where it represented a novel blind spot may have been months long. In actively maintained frameworks with frequent releases, this lag is essentially permanent; there is always a version ahead of the current rule set.

The Custom Code Gap

The freshness problem is most acute for code that nobody wrote a rule for because nobody outside the organization has seen it.

Most production codebases contain internal libraries, custom middleware, domain-specific abstractions, and in-house frameworks. A Django shop might have an internal decorator that marks routes as requiring specific roles, and another that sanitizes inputs according to company-specific rules. From outside the organization, these abstractions are invisible. No SAST rule exists for them.

This means the code that most benefits from security analysis, the custom application logic that differs from one organization to the next, is exactly the code with the weakest SAST coverage. Rules written for Flask’s request.form won’t help when an internal abstraction wraps and re-exposes request data under a different name. Rules for SQLAlchemy parameterization won’t apply to an internal query builder unless someone wrote a rule specifically for it.

Teams that invest seriously in SAST often write custom rules for their internal frameworks. Semgrep’s rule-writing tooling and CodeQL’s query language both support this. Writing useful custom rules requires deep understanding of the abstraction, its security properties, and the pattern variants that indicate a problem. Most teams don’t have that capacity on a consistent basis, so custom rules don’t get written, and the custom code gets no coverage. The security scan runs. The coverage report looks complete. The custom authorization middleware that handles 40% of the application’s request volume was never checked by any rule.

AI Analysis and the Knowledge Cutoff Problem

AI-driven constraint reasoning has a different knowledge model, with different freshness properties.

An LLM trained on code, security advisories, CVE descriptions, and vulnerability research encodes security knowledge across the breadth of its training corpus, without being organized into explicit rule files. It knows that passing user input to subprocess.call(shell=True) is dangerous, that JWT libraries have common implementation vulnerabilities, that eval() on user-controlled strings is exploitable, because those patterns appear extensively in training data. The knowledge is probabilistic and implicit rather than explicit and rule-based.

For a new vulnerability class discovered after the model’s training cutoff, the knowledge simply doesn’t exist in the model. There is no lag to close because there is no coverage at all. For SAST, you can write a new rule; for an AI-based system, the gap persists until the model is retrained. This is a meaningful limitation for vulnerability classes at the frontier of security research.

For code patterns within the model’s training distribution, though, coverage is often broader and more context-sensitive than a rule-based system. The model has seen the pattern in multiple forms, across multiple frameworks, with multiple sanitizer implementations, and can reason about whether a specific instance is dangerous in its specific context. This is how AI constraint reasoning can catch an incomplete sanitizer like filename.replace("../", "") that a SAST rule would accept as valid sanitization, because the model knows that URL-encoded variants and unicode normalization bypass that specific replacement.

Custom code is a gap, but a different kind of gap than SAST’s. If an internal authentication decorator follows a recognizable pattern, the model may reason about it correctly by analogy to similar patterns in its training data. If it’s genuinely novel in a security-relevant way, the model’s reasoning may be unreliable. The coverage is probabilistic and non-enumerable; there’s no rule list to inspect.

Interpreting What Coverage Claims Actually Mean

The freshness problem doesn’t displace the precision argument that Codex Security’s design is built on. It adds a dimension to consider alongside precision: for which code, in which contexts, does the tool’s security knowledge apply?

For SAST, the answer is bounded and enumerable: the rules that shipped with the tool, plus any custom rules your team wrote, at the versions currently deployed. Knowledge gaps are visible by inspecting the rule set; new gaps can be addressed by writing rules. The maintenance burden is real but manageable if you treat rule-writing as ongoing security engineering work, not a one-time configuration step.

For AI-based analysis, the answer is probabilistic and opaque: the vulnerability classes well-represented in training data, applied to code patterns that resemble training examples, with lower confidence on novel patterns and no coverage for vulnerability classes discovered after the training cutoff.

A clean SAST report means: these enumerated patterns were checked and not found. A clean AI constraint reasoning report means: the model found nothing it was confident enough to report, which may reflect clean code or may reflect the limits of its knowledge distribution. The two silences mean different things, and the distinction matters most for the code that sits furthest from well-studied, well-documented patterns, which is often exactly the code that handles your application’s most sensitive operations.

Was this interesting?