Not All Vulnerabilities Yield to Constraint Reasoning

The Taxonomy That Makes Constraint Reasoning Legible

OpenAI’s Codex Security skips the SAST report in favor of constraint reasoning and validation, a design decision that makes sense at the architectural level. The more practical question is which specific vulnerability classes benefit from this approach, and which ones hit the approach’s ceiling regardless of how capable the underlying model is. Walking through the major categories reveals a clear division, and that division is useful for deciding where Codex Security actually belongs in a security tooling stack.

Injection-Class Bugs: Strongest Fit

SQL injection, OS command injection, and template injection are where constraint reasoning has the most compelling advantage over taint-based SAST.

The reason is structural. Injection vulnerabilities reduce cleanly to a satisfiability question: given the constraints that application code places on user input before it reaches a query or command, can an attacker construct input that changes the semantic meaning of that query or command? Taint analysis traces whether user-supplied data can reach a dangerous sink, but it does not answer the satisfiability question. Constraint reasoning answers it directly.

def run_category_report(conn, user_input: str):
    VALID = frozenset({'summary', 'detail', 'audit'})
    if user_input not in VALID:
        raise ValueError("Invalid category")
    return conn.execute(
        f"SELECT * FROM reports WHERE category = '{user_input}'"
    )

A taint-based tool sees user input reaching a string-interpolated SQL call and fires. A constraint reasoner with access to the full function can observe that user_input is constrained to three known-safe string literals before reaching the query, and conclude that no injection is satisfiable through this path. This is not a hypothetical improvement; it is exactly the pattern class that produces the highest false positive volumes in production SAST deployments. Eliminating it has direct practical value.

The caveat is that this works when constraints are expressed in a form the model can reason about. Allowlist membership, integer range checks, and parameterized queries are legible constraints. Constraints derived from external state, database lookups, or runtime configuration are harder to evaluate statically, and the model’s confidence should degrade accordingly.

Path Traversal: The API Semantics Case

Path traversal sits between the easy and hard cases. The base pattern, user input reaching a filesystem operation without sanitization, is detectable by taint analysis. The harder question is whether the sanitization present is sufficient, and that depends on the specific API.

In Python’s standard library, os.path.normpath() normalizes path components (../ sequences and redundant separators) but does not resolve symlinks. os.path.realpath() resolves symlinks and normalizes. Whether a given call to os.path.normpath() prevents a traversal attack depends on whether symlink resolution is necessary, which in turn depends on the underlying filesystem and whether an attacker can place symlinks in the target directory.

SAST rules either flag all user-influenced filesystem operations (producing high false positive rates) or recognize a list of canonical sanitizers (producing high false negatives for custom logic). Constraint reasoning has a potential advantage here because reasoning about API-level semantic distinctions is exactly where broad training exposure helps. The difference between normpath and realpath appears in documentation, in security write-ups about symlink attacks, and in code review patterns that a model with broad exposure can internalize without requiring a rule author to encode it explicitly.

Deserialization: Where Taint Analysis Holds Up

Unsafe deserialization is one of the few vulnerability classes where pattern-matching SAST does not need much assistance. The sinks are specific and documented: ObjectInputStream.readObject() in Java, pickle.loads() in Python, PHP’s unserialize(). The data flow from an HTTP request body to one of these calls is the canonical pattern.

The nuance SAST cannot resolve is that deserialization danger depends on what is available on the classpath. readObject() on a controlled internal message is different from readObject() on an arbitrary client payload when the classpath includes a library with a usable gadget chain. A deserialization flag without gadget chain analysis is technically correct and practically incomplete.

Constraint reasoning faces the same wall. Gadget chain analysis requires inspecting every class available to the deserializer and tracing method chains that could produce a code execution primitive. The ysoserial project documents these chains precisely because the analysis is nontrivial enough to require dedicated research effort. Static reasoning over application code alone cannot substitute for it. In this category, constraint reasoning does not add much over taint-based detection; the bottleneck is not false positives from pattern noise, it is the complexity of confirming exploitability at a depth neither approach reaches.

Authorization: The Systemic Property Problem

Authorization bypass vulnerabilities expose a structural limit shared by both SAST and constraint reasoning. Injection is a local code property: a specific data flow with specific constraints. Authorization correctness is a systemic property: the relationship between what a piece of code does and what the application’s security model is supposed to allow.

A route handler that performs a privileged action without requiring authentication is a vulnerability, but detecting it requires knowing what check should have been present. There is no canonical sink for “forgot to call require_auth()”, and no taint flow to trace. SAST rules that detect this pattern work by recognizing conventional idioms, such as the absence of middleware in an Express route chain, and they fail on any deviation from those conventions.

Constraint reasoning faces the same problem at a higher level of abstraction. The model can reason about what a code path does. It cannot reason reliably about whether the absence of a check represents a violation of the security model without enough surrounding context to infer what the security model is supposed to be, and that context is only partially expressed in code.

CVE-2021-41773, the Apache httpd path traversal that enabled remote code execution, illustrates the difficulty. The vulnerability existed because path normalization and authorization checking happened against different representations of the same path in two separate subsystems. Each subsystem appeared correct in isolation. The bug was in their interaction, and detecting it required understanding the intended contract between them. Static analysis at the function or module level cannot see this; it requires reasoning about system-level invariants that are only partially expressed in code.

Race Conditions and TOCTOU: The Temporal Ceiling

Time-of-check-time-of-use (TOCTOU) vulnerabilities mark the clearest ceiling for static analysis of any kind. The classical pattern:

if (access(path, R_OK) == 0) {       /* check */
    /* attacker may swap path via symlink here */
    fd = open(path, O_RDONLY);        /* use */
}

A SAST rule can recognize the check-then-use separation. Constraint reasoning can observe that a gap exists between the check and the use, and reason about whether attacker intervention is theoretically possible. Neither can determine whether the gap is exploitable without knowing the scheduler’s behavior, the filesystem semantics, and whether the attacker can place a symlink in the path during that window.

Confirming a TOCTOU vulnerability requires runtime instrumentation or concurrency-aware analysis. ThreadSanitizer and Helgrind detect races by observing actual thread interleaving during execution. For file-based TOCTOU, testing requires constructing the race condition under controlled execution. There is no substitute for this in static or constraint-based analysis. Codex Security would surface TOCTOU patterns as candidates without being able to confirm them the way it can confirm injection-class findings, and that distinction matters for how findings from the two categories should be treated downstream.

The Practical Division

The pattern that emerges from this taxonomy is not that constraint reasoning is uniformly better than SAST. The two approaches handle the vulnerability landscape differently, with complementary strengths and gaps.

Injection-class bugs, where the vulnerability reduces to a satisfiability question over input values and code-level constraints, are where constraint reasoning adds the most precision. Path traversal cases with API-semantic nuance are also well-suited, provided the model has reliable knowledge of the specific library semantics. Deserialization detection does not require much improvement over what taint analysis already provides; gadget chain analysis is the missing piece and neither approach supplies it. Authorization and concurrency vulnerabilities are structurally resistant to both approaches, for different reasons: authorization because it is a systemic property, concurrency because it is a temporal one.

For a team deciding where Codex Security fits in their tooling stack, the OWASP Web Security Testing Guide provides a useful categorization. Injection and input validation categories map onto constraint reasoning’s strong zone. Authentication, session management, and access control categories remain dependent on manual review and dynamic testing regardless of which automated tool is running. The improvement Codex Security offers is real and concentrated in a specific part of the vulnerability landscape; the coverage gaps are equally real, and they do not dissolve by replacing SAST with AI. Mapping the tool to the right vulnerability classes is the prerequisite for using it well.