The SAST Coverage Gap That Widens Where Developers Are Writing Safer Code
Source: openai
When OpenAI published their explanation of why Codex Security doesn’t include a SAST report, the discussion focused predictably on false positive rates, taint tracking limitations, and whether the validation step works reliably. That framing makes sense for Java shops running CodeQL against Spring applications or Python services with Bandit configured in CI. But it largely misses a dimension that matters considerably for where a lot of new production code is being written: SAST coverage is not uniform across languages, and the gap between where it works well and where it does not is substantial.
The Maturity Ladder
Not all language ecosystems sit at the same point in SAST development. Java is at the top. The JVM’s well-defined semantics, decades of enterprise tooling investment, and rule databases from Fortify, FindBugs (later SpotBugs), SonarQube, and CodeQL mean that Java SAST is deep and continuously maintained. You can run CodeQL against a Spring Boot application and get credible taint analysis for injection classes, deserialization vulnerabilities (CWE-502), and framework-specific issues like OGNL injection in Struts. The rules were written by people who understood Spring’s security idioms specifically, and they have been refined through years of real-world use.
C and C++ sit differently: excellent coverage for memory safety vulnerabilities (buffer overflow, use-after-free, out-of-bounds access) from tools like Coverity and clang’s static analyzer, but weaker on application-layer business logic. The fuzzing ecosystem for C/C++ is also mature: AFL++ and libFuzzer have been actively developed for over a decade and cover the runtime behavior that static analysis misses.
Python and JavaScript occupy middle ground. Semgrep and Bandit provide reasonable Python coverage for common frameworks; ESLint security plugins and Semgrep handle JavaScript. False positive rates run higher than in statically typed languages because dynamic dispatch makes precise data flow analysis harder, and rule quality varies significantly based on which frameworks a project uses.
Go and Rust are where the coverage gap becomes pronounced, and they are precisely the languages that security-conscious teams are adopting to reduce their vulnerability surface.
Go: Reasonable Tooling, Specific Blind Spots
The Go security ecosystem has gosec, which covers standard concerns: SQL injection through string formatting, command injection, hardcoded credentials, and TLS configuration issues. The GitHub Security Lab maintains CodeQL queries for Go. Coverage for mainstream HTTP frameworks and database drivers is workable.
Where Go-specific analysis runs into trouble is the idiomatic patterns that differ meaningfully from the Java and Python idioms where rule databases are richest. Go’s error handling model creates security-relevant failure modes that standard taint analysis does not reach:
// conn is returned even if dial fails
conn, _ := tls.Dial("tcp", host, config)
_, _ = conn.Write(sensitiveData)
There is no taint flow here in the classic sense. User-controlled data is not flowing to a dangerous sink. The issue is that ignoring the error from tls.Dial means operating on a potentially nil or degraded connection, depending on how the TLS library handles failure. A rule that flags _ error ignoring from net/tls calls would produce too many false positives in legitimate contexts. The right analysis is semantic: understanding what the specific operation is, what a failed return means for subsequent operations, and whether those operations are security-sensitive.
Go’s concurrency model adds another dimension. Goroutines sharing access to security-sensitive state (session tokens, authentication flags, rate-limiting counters) create data race surfaces that static analysis tools approximate poorly. Go’s -race instrumentation catches these at runtime, but finding them statically across goroutine boundaries requires inter-thread analysis that most SAST tools either skip or handle conservatively enough to be noisy.
Rust: Where SAST Lost Its Original Purpose
Rust is the sharper case. The ownership and borrowing system eliminates the memory safety vulnerability classes, buffer overflow, use-after-free, out-of-bounds access, that SAST tools were largely built to find. Running a memory safety SAST tool against Rust code returns nearly nothing, not because the tool is thorough but because the type system has already enforced those properties at compile time.
Google’s Project Zero and others have documented that roughly 70% of serious security vulnerabilities in systems software are memory safety bugs. Rust structurally eliminates that class. What remains is different:
unsafe blocks are where the ownership model’s guarantees are suspended. Any analysis of Rust security that does not reason carefully about unsafe blocks is missing the highest-risk region. Tools like cargo-geiger count unsafe usage in dependencies, which is useful for audit purposes. Reasoning about what a specific unsafe block does and whether it could produce an exploitable condition requires semantic understanding that coarse-grained counting does not provide.
Integer arithmetic in security-sensitive contexts is another underserved class. Rust’s debug builds panic on overflow, but release builds use wrapping arithmetic by default. The Rustonomicon is explicit about this. An integer overflow in a calculation that feeds an allocation size or a buffer offset can be an exploitable vulnerability. Whether a specific integer operation produces a dangerous result depends on what happens to the value downstream, which is exactly the kind of constraint reasoning that SAST pattern matching handles poorly.
Deserialization through serde handles untrusted data in essentially every Rust service. Vulnerabilities here are logic-level: recursive structures causing stack exhaustion, oversized allocations, or custom Deserialize implementations with incorrect invariants. The RUSTSEC advisory database tracks these. Rules written for Python’s pickle or Java’s serialization do not transfer; Rust-specific rules for serde deserialization issues remain thin.
Logic and authorization bugs form the remaining surface, and these are the vulnerability classes that SAST pattern matching handles worst across all languages.
What Constraint Reasoning Offers Without a Rule Database
The relevant property of AI-based constraint reasoning is that it does not require language-specific rule databases. A model trained on diverse code and security literature can reason about what a Rust unsafe block does and whether the operations inside it satisfy the safety invariants the surrounding safe code relies on. It can understand that wrapping arithmetic on a value feeding into an allocation size is a different kind of concern than wrapping arithmetic in a checksum. It can recognize whether a custom Deserialize implementation handles pathological input safely by reasoning about what it does, not by matching it against a rule encoding the pattern.
This is what Codex Security’s constraint reasoning approach offers that rule-based SAST structurally cannot: semantic understanding that generalizes across language features without requiring those features to be explicitly encoded by a rule author. The model does not need a specific Rust rule for integer overflow in allocation size calculations. It can reason about whether the constraints that must hold at an allocation call are satisfied given how the size value is derived.
Framework coverage extends the same advantage. The Go microservice ecosystem uses gRPC, Gin, Echo, and patterns that differ from the Java/Spring idioms where SAST rule databases are densest. A constraint reasoning system that understands what Gin’s context binding does to request parameters, and whether that binding satisfies the requirement that only validated, typed values reach a SQL call, does not need a Gin-specific rule. It reasons about semantics.
The Inversion
In languages where SAST tooling is most mature, Java and C/C++, constraint reasoning supplements a capable baseline. The existing CodeQL queries for Java are thorough and well-maintained; an AI layer that validates findings and cuts false positives improves something that already works reasonably well.
In languages where constraint reasoning offers the most independent value, Rust and Go and the broader ecosystem of newer, safety-oriented languages, it is stepping into a gap where the alternative is not a less precise version of the same thing but thin coverage built around vulnerability classes that the language itself already handles. The security tooling that exists for these languages tends to address the residual surface incompletely, and the rule investment required to close that gap has not materialized at the same scale as it did for Java a decade ago.
The absence of a SAST report in Codex Security means something different depending on where a codebase sits. For a Java team with mature CodeQL configurations and years of tuned suppression rules, substituting AI-driven analysis for SAST involves real trade-offs: determinism, auditability, and compliance attestation all change. For a Go or Rust team, the comparison is between constraint reasoning and a sparse collection of rules that were not built for the specific security surface those languages expose. That is a different trade-off, and the case for accepting it is considerably stronger.
Security tooling investment tends to follow adoption, and adoption of Java and C/C++ was concentrated in a decade when the tooling was being built out. The teams writing new services in Rust and Go are working in languages that deliberately closed one major vulnerability category, then finding that the tooling ecosystem for the remaining categories never fully developed. Constraint reasoning that generalizes from semantic understanding rather than accumulated rules is the approach most likely to catch up.