· 6 min read ·

Beyond Taint Tracking: The Vulnerability Classes That Require Code Semantics

Source: openai

The vulnerability classes that SAST handles well share a common structure: attacker-controlled data flows through the program and reaches a dangerous operation. SQL injection, XSS, command injection, path traversal. These all have a structural signature that taint analysis can capture. You mark the sources, mark the sinks, track propagation, report on convergence. The model is not perfect, but it works for a well-defined class of problems.

Many real security vulnerabilities don’t have this structure. They don’t involve tainted data reaching a dangerous sink. They involve correct data used incorrectly, missing checks, or operations that are individually safe but dangerous in combination, or business logic implemented differently from how it was intended. Pattern matching cannot find these bugs because there is no pattern to match. Finding them requires understanding what the code is supposed to do.

OpenAI’s case for Codex Security’s AI-driven approach is partly about reducing false positives on the bug classes SAST does handle. The more substantive dimension is the bugs it can find that SAST structurally cannot.

Insecure Direct Object Reference

IDOR is one of the most consistently exploited vulnerability classes in web applications. It appears in every OWASP Top 10 list. SAST tools essentially cannot find it.

An IDOR looks like this:

@app.route('/api/invoices/<int:invoice_id>')
def get_invoice(invoice_id):
    invoice = db.query(Invoice).filter_by(id=invoice_id).first()
    return jsonify(invoice.to_dict())

The invoice ID comes from the URL. There is no taint-trackable danger here: a database read is not a dangerous sink in the SAST model. The code is structurally clean. The vulnerability is that the function retrieves an invoice by its ID without checking whether the authenticated user owns that invoice. Any authenticated user can read any invoice by guessing or enumerating IDs.

A SAST tool looking at this function sees a route handler that reads from the database and returns a response. The taint model has nothing to attach to: no unsanitized value reaching a bad operation, no dangerous sink. The tool reports nothing.

Finding this bug requires understanding the authorization model the application is supposed to enforce. You need to know that invoice retrieval should require an ownership check, which means understanding the application’s intent, not just its structure. This is precisely what AI-driven semantic analysis can bring: reasoning about what a function is doing in context against what it should be doing.

Authorization Logic Errors

A related class involves authorization that exists but is wrong. Not missing, but incorrect:

def can_edit_document(user, document):
    return user.role == 'admin' or document.owner_id == user.id

def update_document(user_id, document_id, new_content):
    user = get_user(user_id)
    document = get_document(document_id)
    if can_edit_document(user, document):
        document.content = new_content
        document.save()

This looks like it has authorization. It does. But user_id in update_document comes from the request: perhaps a form field or a JWT claim. If the caller takes user_id from a user-supplied value rather than from the server’s session state, the entire authorization check is bypassed because the attacker can supply their own user_id. The can_edit_document function is correct in isolation; the vulnerability is in how its caller passes the identity parameter.

SAST tools cannot reason about the distinction between “identity derived from authenticated session” and “identity taken from user input.” These are semantically different things that look identical at the level of types and data flow. The parameter is a string or an integer either way. Finding this class of bug requires reasoning about trust levels, not data flow.

Confused Deputy

The confused deputy problem occurs when a privileged component acts on behalf of a less-privileged caller without properly verifying that the requested operation is within the caller’s authority.

public byte[] ReadFileContents(string filename)
{
    // This service runs with elevated filesystem access
    // to support the document processing pipeline
    string fullPath = Path.Combine(DocumentRoot, filename);
    return File.ReadAllBytes(fullPath);
}

Leave aside the path traversal question for a moment. The deeper issue is whether the calling context that provides filename has the authority to read arbitrary files under DocumentRoot. If this method is callable by any authenticated user, and DocumentRoot contains files belonging to other users, the service’s elevated access becomes available to any caller. Whether this is exploitable depends on who can call this method and what the surrounding authorization model looks like, not on the data flow within it.

SAST sees a file read. It may or may not flag it depending on taint configuration. It cannot determine whether the authority delegation is appropriate because that requires understanding the service’s role in the larger system. The trust model is external to the function.

Time-of-Check to Time-of-Use

TOCTOU vulnerabilities occur when a security check happens at a different point in time than the operation it is supposed to protect. The classic file system variant in C is well-known:

if (access(path, R_OK) == 0) {
    fd = open(path, O_RDONLY);
    // read from fd
}

The access() call checks permissions; the open() call uses the file. If an attacker can replace path with a symlink pointing to a privileged file between the two calls, the check passes on the original file but the open operates on the replacement.

Some SAST tools have rules for this specific pattern in C because it is famous enough to warrant one. TOCTOU in higher-level code goes uncaught: distributed systems checking a database value before acting on it, web applications checking a user’s account status in middleware and then trusting that status downstream without re-verification, inventory systems checking stock levels before completing a purchase in a race-prone transaction model. The general pattern, a state-check followed by a state-use where the state can change between them, is not expressible as a syntactic rule without generating enormous numbers of false positives across code that happens to read state and then act.

AI-driven analysis can reason about this class of problem across contexts because it understands what the code is doing: checking a condition and then acting on it, with a window between them where the condition could change. That semantic description applies regardless of the language, framework, or specific operations involved.

Business Logic Flaws

The broadest category is business logic flaws: vulnerabilities that arise from the application doing something that violates its own stated rules, in a way that has security consequences.

Consider an e-commerce application with a coupon system. The rules say each coupon can be applied once per order. The implementation:

def apply_coupon(order_id, coupon_code, user_id):
    coupon = get_coupon(coupon_code)
    if coupon.used_by is None:
        order = get_order(order_id)
        order.discount = coupon.discount_amount
        coupon.used_by = user_id
        db.session.commit()
        return True
    return False

This is vulnerable to a race condition: two concurrent requests can both pass the coupon.used_by is None check before either commits. But beyond that, there is no check that order_id belongs to user_id, so a user can apply a coupon to any order. And there is no validation that the order is in a state where applying a discount makes sense.

None of these are taint-trackable. There is no dangerous sink. There is no injection vector. The vulnerabilities exist entirely in the relationship between the application’s stated business rules and the code that implements them. Finding them requires holding both the rules and the implementation in mind simultaneously, comparing the two, and identifying gaps.

What Changes With Semantic Analysis

The vulnerability classes above are not obscure edge cases. IDOR is consistently among the top findings in bug bounty programs. Authorization logic errors are a regular feature of application security audits. Business logic flaws in payment flows, access control, and state management are among the most impactful bugs found in production systems because they require no technical exploit, just an understanding of the gap between what the application should do and what it does.

None of them are reliably findable with pattern matching. They require, at minimum, understanding of intent: what the code is supposed to enforce, what trust relationships should hold, what state transitions are valid. The framing in OpenAI’s Codex Security post around AI-driven constraint reasoning and validation is meaningful precisely because constraint reasoning can operate at this level. A model reading get_invoice(invoice_id) can ask whether the caller’s identity has been verified against the retrieved resource; a taint rule cannot.

The practical implication is not that AI analysis replaces SAST for the bug classes SAST handles well. A hybrid approach makes sense: SAST for taint-trackable injection categories, AI reasoning for semantic vulnerability classes that require context. Semgrep and CodeQL remain fast, cheap, and reliable within their scope. What changes is what you evaluate a security tool against. A SAST report with zero findings for SQL injection tells you something concrete. A SAST report with zero findings says nothing about whether the application correctly enforces authorization across every code path. For that, you need a tool that can read code the way a security reviewer reads it, asking not just what this does but what it should do, and whether there is a gap between the two.

Was this interesting?