· 7 min read ·

One Category Was Never Enough: How C2Y Plans to Classify Undefined Behavior

Source: lobsters

The C standard has contained a joke for thirty-five years, and the joke is on the programmer. Since C89, the standard has described certain program behaviors as “undefined,” which technically means the implementation may do anything at all, including — and this is the actual wording from the C89 rationale — make demons fly out of your nose. That phrase originated in a 1992 comp.std.c newsgroup thread and became a shibboleth for anyone who had been burned by a compiler doing something unexpected with code that violated the standard.

A new WG14 working paper, N3861, titled “Ghosts and Demons: Undefined Behavior in C2Y”, takes that informal taxonomy seriously. The argument is straightforward: not all undefined behavior is equally dangerous, compilers treat different classes of UB differently in practice, and the standard should acknowledge those distinctions formally. Leaving everything in one bucket has cost the community decades of security vulnerabilities, kernel CVEs, and tooling confusion.

What “Undefined” Has Always Meant

The C standard defines four categories of behavior. Implementation-defined behavior is unspecified but documented by each implementation. Unspecified behavior is one of a set of valid outcomes, chosen by the implementation without documentation. Undefined behavior imposes no requirements whatsoever. And constraint violations require a diagnostic, though in C (unlike C++) the compiler is not required to reject the program after diagnosing it.

The UB category is the most powerful of these because it grants the compiler an unlimited optimization license. The reasoning behind it was originally pragmatic: C89 was designed to run on hardware with wildly different integer representations, trap behaviors, and memory models. If signed integer overflow is undefined, then a compiler targeting ones’-complement hardware can trap, a compiler targeting two’s-complement hardware can wrap, and both are conforming. The standard accommodates all architectures by promising nothing specific.

This reasoning was defensible in 1989. It became a problem as compilers started using UB not just to accommodate hardware variation, but as a proof system for aggressive optimization.

When Optimization Eats Your Code

The canonical example of demon-class UB is null pointer check elimination. Consider this pattern:

void process(struct node *p) {
    int val = p->value;  // dereference before the check
    if (p == NULL)
        return;
    use(val);
}

With optimizations enabled, GCC and Clang may eliminate the p == NULL branch entirely. The reasoning: if p were null, dereferencing it on the first line would be undefined behavior. The compiler is permitted to assume UB does not occur. Therefore p is non-null, therefore the check is dead code, therefore it can be removed.

This is not theoretical. The Linux kernel was historically compiled with -fno-delete-null-pointer-checks specifically to prevent this class of optimization. Chris Lattner described several variants of this in his influential 2011 post “What Every C Programmer Should Know About Undefined Behavior.”

Signed integer overflow produces a similar problem:

for (int i = 0; i <= n; i++) {
    // ...
}

The compiler may reason that if i ever overflows, UB occurs; therefore i never overflows; therefore the loop counter can be widened to a pointer-sized type and the comparison simplified. This widening is important for autovectorization. Disabling it with -fwrapv has measurable throughput costs on numerical workloads, which is why the optimization community has consistently resisted making signed overflow defined.

Strict aliasing is a third category. Since C99, section 6.5p7, accessing an object through a pointer of an incompatible type is undefined. GCC at -O2 -fstrict-aliasing (the default) will cache values across stores through unrelated pointer types:

void update(float *f, int *i) {
    *i = 0;
    return *f;  // compiler may not reload; assumes *f unchanged
}

With strict aliasing active, f and i are assumed to not alias because float * and int * are incompatible types under C99’s rules. The store through i is invisible to the load of *f. The correct fix is memcpy for type-punning, or a union, or __attribute__((may_alias)).

All three of these are what N3861 proposes to call “demon” UB: the compiler actively uses the occurrence of UB as license to transform code in ways that affect program behavior far from the source of the violation.

Ghosts: The Benign End of the Spectrum

Not all UB is like this. Some UB produces a wrong value locally and nothing else. An uninitialized read on a real system will produce whatever bytes happened to be in the stack frame or register. That is incorrect, but the garbage value is a value. The compiler does not need to eliminate surrounding code to handle it.

This is what N3861 calls “ghost” UB: the violation exists, a bad result occurs, but the blast radius is local. The program may produce incorrect output, but it does not silently delete your security checks.

The paper also proposes a formal intermediate category: “erroneous behavior.” This is behavior that is wrong but must produce a constrained, detectable outcome. The motivating case is uninitialized reads. Under the current standard, reading an uninitialized variable is full UB, meaning the compiler can use that fact to transform code globally. The proposal would reclassify it: the compiler would be required to produce some value (whatever is in memory or registers), but could not use the uninitialized read as optimization license to remove surrounding code. GCC’s -ftrivial-auto-var-init and LLVM’s automatic stack initialization would become conforming implementations of this erroneous behavior, rather than extensions beyond the standard.

C23 Set the Stage Without Finishing the Job

C23 (published as ISO 9899:2024) made the most substantive changes to the UB landscape in decades. It mandated two’s-complement representation for signed integers in section 6.2.6.2, eliminating the historical justification that leaving signed overflow undefined was necessary for portability to ones’-complement and sign-magnitude hardware. Neither of those architectures has been commercially relevant for decades.

This was significant, but it did not change signed overflow’s UB status. The justification shifted from “we need to accommodate exotic hardware” to “we need to preserve optimization licenses for compilers.” The _BitInt(N) type, also new in C23, takes the opposite approach: arithmetic on _BitInt wraps by definition, giving fixed-width integers the defined overflow behavior that int still lacks.

C23 also introduced unreachable(), a macro that explicitly invokes UB at a named location. This is the first time the standard has formally recognized that intentional UB exists, which is conceptually interesting: a programmer uses unreachable() to assert that a code path is dead, and the standard encodes that assertion as a UB invocation that gives the compiler permission to assume the path is dead. It is controlled UB, but UB nonetheless.

The two’s-complement mandate and the unreachable() addition together create the conditions for C2Y’s UB taxonomy work. Once the portability excuse for signed overflow UB is gone, the standard can be more honest about why certain behaviors remain undefined.

What Other Languages Chose Instead

Rust makes the comparison stark. In safe Rust, signed integer overflow panics in debug builds and wraps in release builds; the behavior is documented, mode-specific, and never undefined in the C sense. In unsafe blocks, Rust has its own UB list, but the boundary is explicit and syntactically visible. UB in safe code is a compiler bug. The Unsafe Code Guidelines effort is doing for Rust unsafe what N3861 is trying to do for C: formally cataloging what UB exists and what its semantics are.

Zig takes a mode-based approach that is arguably the closest model to what WG14 is trying to formalize. In Debug and ReleaseSafe modes, signed overflow, out-of-bounds access, null pointer dereference, and reached-unreachable all trap with a stack trace. In ReleaseFast, the checks are removed and the underlying platform behavior is permitted, with the understanding that the programmer has verified correctness externally. Zig also provides @addWithOverflow and related intrinsics for explicit checked arithmetic. The model cleanly separates “what happens when we detect an error” from “what we promise in production.”

C++ has tracked C closely on UB through shared heritage in WG21, and C++23 added std::unreachable() mirroring C23’s addition. C++26 is discussing erroneous behavior semantics for uninitialized reads on the same timeline as C2Y, with cross-pollination between the committees through shared participants in SG12.

The Work That Remains

The erroneous behavior reclassification for uninitialized reads is C2Y’s most concrete UB reform proposal, and it is achievable without breaking the optimization model. What it requires is that compilers stop treating an uninitialized read as proof that the surrounding function can be transformed globally. That is a narrow constraint, and one that safety-critical deployments (which already use -ftrivial-auto-var-init or equivalent) would benefit from seeing standardized.

The signed overflow question is harder, because the optimization stakes are real. The benchmarks that show vectorization regressions under -fwrapv are real. The security community’s argument, that signed overflow as full UB has contributed to more CVEs than the vectorization benefit is worth, is also real. The N3861 taxonomy at least makes that trade-off legible: if signed overflow is formally categorized as demon-class UB, then the security cost of that classification is explicit and debatable, rather than buried in Annex J as one item among hundreds.

C2Y will not ship for years, the formal process is slow, and WG14 papers like N3861 are proposals, not decisions. But the fact that this taxonomy has a document number and a committee audience is progress. For most of its life, the C standard has treated undefined behavior as a single thing. The next version may finally acknowledge that it was always a spectrum.

Was this interesting?