· 6 min read ·

What the SARIF Standard Built, and What Codex Security Opts Out Of

Source: openai

When OpenAI explains why Codex Security doesn’t produce a SAST report, the precision argument is clear: constraint reasoning produces validated findings rather than pattern matches, so the report format built around unvalidated patterns is the wrong output. That argument is sound. It also understates the infrastructure cost of opting out, because SAST’s influence over developer security tooling runs much deeper than the findings themselves.

Over the past decade, an entire toolchain grew up around one assumption: security findings arrive as SAST output in a standardized interchange format. Understanding what that infrastructure looks like, and what disappears when you skip the report, is a separate question from whether constraint reasoning is more precise than taint analysis.

What SARIF Actually Is

The Static Analysis Results Interchange Format, SARIF 2.1.0, is a JSON schema maintained by OASIS. It provides a standard representation for static analysis output: findings (called “results” in the spec), their locations in source files, the rule that triggered each finding, severity levels, CWE and OWASP Top 10 taxonomy mappings, suggested fixes, and suppression metadata.

A minimal result looks like this:

{
  "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
  "version": "2.1.0",
  "runs": [{
    "tool": {
      "driver": {
        "name": "Semgrep",
        "rules": [{ "id": "python.lang.security.audit.sqli.formatted-sql-query" }]
      }
    },
    "results": [{
      "ruleId": "python.lang.security.audit.sqli.formatted-sql-query",
      "level": "error",
      "message": { "text": "Potential SQL injection via string formatting" },
      "locations": [{
        "physicalLocation": {
          "artifactLocation": { "uri": "app/users.py", "uriBaseId": "%SRCROOT%" },
          "region": { "startLine": 14, "startColumn": 12 }
        }
      }]
    }]
  }]
}

This format is why Semgrep, CodeQL, Checkmarx, Snyk, ESLint, and dozens of other tools can all plug into the same downstream pipeline. Any tool that speaks SARIF gets the same integrations automatically.

What GitHub Code Scanning Gives You

GitHub Code Scanning is the most widely used consumer of SARIF output. It ingests SARIF files via the github/codeql-action/upload-sarif action and does several things with them that developers interact with daily.

PR annotations: when a new commit introduces code matching a SARIF result, Code Scanning annotates the relevant line directly in the pull request diff. The reviewer sees the finding inline, with rule description and severity, without leaving the PR review UI.

Security dashboard: findings aggregate in the repository’s Security tab with filtering by severity, rule, branch, and state. Historical data makes it possible to see whether vulnerability density is rising or falling across commits.

Branch protection: you can configure rules that block merges if new SARIF findings exceed a severity threshold. This is a hard gate on the merge button, not a notification.

Finding lifecycle tracking: Code Scanning tracks each finding through states, open, dismissed with a required reason, and auto-closed when a commit removes the triggering code. This creates an audit trail of how vulnerabilities were resolved.

GitHub Advanced Security adds secret scanning and dependency review on top of the same infrastructure. Enterprise organizations can query security alert state across repositories via the REST API and feed it into SIEM systems. All of this is downstream of producing a SARIF file.

Where Codex Security Lands in This Stack

Codex Security produces PRs, not SARIF files. When it finds a validated vulnerability, it opens a pull request with a fix. That integration has real advantages: it shortens the time from finding to fix by skipping the triage step, and it puts the finding in the context where developers already operate rather than a separate dashboard.

But the finding does not appear in GitHub’s security dashboard. It does not generate a PR annotation on the vulnerable line. It does not increment a vulnerability counter that a security team monitors. It does not block a branch protection gate. The finding arrives as a PR from a bot, indistinguishable in the review queue from a dependency update or an automated refactoring.

For a solo developer or a small team, this is probably fine. The PR shows up, you review it, you merge it. The security feedback loop works.

For organizations running formal security programs, the SARIF integration gap has real operational weight. A security manager demonstrating OWASP Top 10 coverage to an auditor does it with SARIF attestation. A compliance officer tracking CWE-89 remediation progress over time uses Code Scanning trend lines. A platform security team enforcing “no high-severity findings on PRs merging to main” uses branch protection backed by Code Scanning. These are workflows built on the SARIF layer, and Codex Security currently sits outside it.

The Coverage Measurement Problem

SARIF carries something that an AI-generated PR does not: an explicit accounting of what was checked. A SARIF file from CodeQL contains the list of queries that ran, the languages analyzed, the rules applied, and the CWE categories covered. When a security auditor asks “what is your SQL injection coverage,” you can answer with a SARIF artifact that documents exactly which rules ran and what they found.

When an AI reasoning system finds nothing, you face a harder epistemic situation. A clean SAST scan means the rules ran and found no matches. A clean AI analysis means the model reasoned about the code and concluded no exploitable vulnerabilities exist, which is not the same guarantee. It could also mean the model encountered code patterns outside its training distribution and made a confident but incorrect assessment.

This is not an argument against AI security analysis. It is an argument that the coverage properties are different and less transparent, and that teams used to auditing SAST coverage need to think differently about what attestation looks like when the tool does not produce a rule-based report.

The IDE Layer

The SARIF ecosystem extends into the editor. VS Code’s SARIF Viewer extension reads SARIF files directly and surfaces findings as diagnostics in the Problems panel, with inline squiggles in the editor and filter views by severity. The GitHub Pull Requests extension pulls Code Scanning annotations into the local editor view of a PR diff.

This means SAST findings can appear while a developer is actively writing code, not after a CI run completes. The feedback loop is tighter than a PR-oriented tool can achieve by design, because the PR does not exist yet when the vulnerability is introduced.

AI-based constraint reasoning at the depth required for semantic validation is not a realistic candidate for real-time IDE feedback given current inference costs and latency. That limitation is reasonable to accept given what the approach gains in precision. It does mean the tool operates at a different point in the development cycle, after a PR is opened rather than during authoring, which shapes how much of the vulnerability surface it can cover.

What Would Bridge the Gap

The gap is bridgeable in principle. A tool that produces constraint-reasoning-validated findings could emit SARIF for those findings, omitting the unvalidated noise that makes existing SAST SARIF so hard to act on. A SARIF file containing only confirmed, exploitable vulnerabilities with accurate CWE taxonomy would be a more useful artifact than most SAST output currently is. The downstream toolchain would treat it identically to any other SARIF upload.

Whether OpenAI builds this integration is an open question. The current framing positions Codex Security as a PR-generating tool, and there is a reasonable argument that the patch PR is the right output artifact for a validation-first system. But SARIF output and patch PRs are not mutually exclusive, and the compliance and dashboard integration value of SARIF is significant enough that the gap seems worth closing for enterprise adoption.

The Layered Answer

The honest picture for most teams is that Codex Security and traditional SAST cover different parts of the security workflow rather than competing for the same slot. SAST running in the IDE or as a pre-commit hook provides early, cheap feedback on common patterns. Code Scanning with SARIF feeds the compliance and security program visibility layer. AI constraint reasoning covers the semantic vulnerabilities that rule-based SAST structurally cannot catch and routes them directly to the fix workflow.

The precision argument for skipping the SAST report is strong. The ecosystem participation question is separate. A tool can be more precise than its alternatives and still sit outside the infrastructure that the rest of an organization depends on for security visibility, compliance attestation, and metric tracking. Knowing where that boundary is helps you plan for both what the tool covers and what adjacent tooling still needs to be in place.

Was this interesting?