From Safety Net to Scalpel: How C Compilers Learned to Exploit Undefined Behavior

C’s undefined behavior was not always the threat it is today. The C89 standard defined a large set of behaviors as undefined, and the compilers of that era mostly left them alone. The permission to do anything was there; the inclination to use it for aggressive optimization was not. Understanding how that changed, and why, explains why the C standards committee is only now working seriously on the problem that N3861, “Ghosts and Demons: Undefined Behavior in C2Y”, attempts to address.

Why UB Existed from the Start

The C89 standard was written for hardware diversity that has since vanished. Machines with ones-complement arithmetic, sign-magnitude integer representation, and various trap behaviors were in active production use in 1989. If the standard mandated two’s-complement overflow semantics, compilers for those machines would need to emulate that behavior, adding overhead that contradicted C’s design goal of providing a portable interface to whatever hardware the machine actually had.

The solution was to say nothing. Signed integer overflow: undefined. Pointer arithmetic outside array bounds: undefined. Accessing memory through a pointer of an incompatible type: undefined. Each compiler could let the hardware do what it did naturally, and the standard would not object. For the hardware landscape of 1989, this was a reasonable engineering decision.

The text of C89 was explicit that undefined behavior existed as a permissive category, not an optimization mandate. The standard noted that a conforming implementation might do anything at all, including, in the famous phrasing from comp.std.c, make demons fly out of your nose. That framing was cautionary. It was not an invitation.

The Optimization Shift

Through the 1990s and into the early 2000s, compilers treated UB conservatively. GCC performed intra-procedural optimizations and some inter-procedural analysis, but aggressive exploitation of UB as a proof system for eliminating code paths was not widespread practice. A compiler that saw if (x + 1 > x) would typically emit a comparison; signed overflow was undefined, but the compiler was not yet reasoning systematically about what that implied.

This began to change as compiler technology advanced and optimization benchmarks became competitive. SPEC CPU scores and language shootout results created pressure to extract more performance from the same source code. Interprocedural analysis, loop induction variable analysis, alias analysis, and value range propagation all matured during this period. Each of these techniques became more powerful when the compiler could treat UB as a hard constraint: if the program is correct, this path cannot occur, therefore we may eliminate it.

Clang and LLVM’s arrival sharpened the competition further. LLVM’s intermediate representation was designed from the start with optimization in mind, structured around the assumption that programs are correct, meaning that undefined behaviors are, by definition, impossible in correct code. A fresh codebase, strong alias analysis, and contributors with formal methods backgrounds produced a compiler that exploited UB more systematically than GCC had.

By 2011, the situation was notable enough that Chris Lattner, LLVM’s creator, wrote a three-part series explicitly titled “What Every C Programmer Should Know About Undefined Behavior”. The series documented with specific examples how LLVM used UB as an optimization premise, including cases where safety checks were being eliminated. Lattner argued the behavior was correct per the standard and that the optimizations were valuable. The posts are still the most accessible documentation of the problem, and they mark the point when the broader community understood that compiler behavior had changed in a fundamental way.

What Compilers Actually Do

The concrete effects are worth examining. Under default optimization with GCC or Clang, this signed overflow check does not work:

int result = a + b;
if (result < a) {
    return -1;  /* intended to detect overflow */
}

The optimizer reasons: if a + b overflowed, that is UB, which cannot occur in a correct program. Therefore result < a is always false. The branch is dead code. The check compiles away silently, with no warning and no diagnostic. Code that was supposed to guard against integer overflow does the opposite, giving the caller false confidence that overflow has been ruled out.

Null pointer check elimination follows the same logic. If a pointer is dereferenced, the compiler may conclude it was non-null at the point of dereference, and use that inference to eliminate null checks elsewhere in the same function. A Linux kernel tun device vulnerability from 2009 worked exactly this way: GCC eliminated a null check because the pointer had already been used before the check. Both GCC and the standard agreed the transformation was correct. The exploitable behavior was the result of the compiler doing its job faithfully.

Strict aliasing produces similar surprises. The rule in C11 section 6.5p7 says an object may only be accessed through an expression of a compatible type. Accessing a float through an int * is undefined behavior:

float f = 3.14f;
int i = *(int *)&f;  /* UB: strict aliasing violation */

Because the compiler may assume this never happens, it can reorder loads and stores across incompatible pointer types freely. Serialization code, hardware register access, and protocol parsing code that relies on type-punning through casts violates this rule. The correct replacement is memcpy, which compilers optimize to a register move:

int i;
memcpy(&i, &f, sizeof i);  /* defined behavior, same codegen */

But the unsafe version had been working for decades before compilers started exploiting the rule aggressively. The change in compiler behavior was not a specification change; the language had always said this was undefined. The compilers just started caring.

The Defensive Ecosystem

The response from the C community was not to wait for the standard to act. Practitioners built compiler flags that individually walk back specific UB exploitation behaviors, and those flags became standard configuration in security-sensitive and systems software.

The Linux kernel is the most visible example. Its build configuration has carried -fwrapv for years, which tells GCC and Clang to define signed integer overflow as two’s-complement wraparound. It added -fno-strict-aliasing to disable alias optimization based on type incompatibility. It added -fno-delete-null-pointer-checks to prevent null check elimination. More recently, -ftrivial-auto-var-init=zero was adopted to zero-initialize automatic variables and prevent information leaks through uninitialized memory.

Each of these flags is an implicit admission that the standard’s UB category is too broad for practical systems programming. The Linux kernel is not an unusual case; it is one of the few projects large enough that the cumulative effect of UB exploitation became visible and documented. Other projects apply similar flag sets without publicizing them.

The sanitizer tooling grew from the same pressure. Clang’s UBSan inserts runtime checks for signed overflow, null pointer dereference, misaligned access, and dozens of other UB categories. MemorySanitizer tracks memory initialization state and traps on uninitialized reads. AddressSanitizer detects out-of-bounds access and use-after-free through shadow memory.

What is notable is that these sanitizers have been operating, for over a decade, on an implicit model of what C programs should do when they encounter an error. When UBSan traps on signed overflow, it is implementing the judgment that the program should halt rather than continue with a wrong value. That judgment is not in the C standard; it is a design decision by the sanitizer authors. The sanitizers are a community-constructed erroneous behavior model, running entirely outside the standard’s formal framework.

What N3861 Formalizes

The erroneous behavior category that N3861 proposes for C2Y is, in substantial part, a formalization of what this defensive ecosystem has been doing. Erroneous behavior removes the optimizer’s license to use a mistake as an optimization premise, while permitting implementations to handle the mistake however they choose: trap, produce a nondeterministic value, or wrap. This is exactly the range of behaviors that -fwrapv, UBSan, and hardware platform defaults already implement, each as a non-standard extension.

The paper’s ghost and demon taxonomy formalizes something similar: the distinction between UBs that compilers actually exploit aggressively and UBs that remain in the standard as artifacts of 1989 hardware concerns. Compiler writers and kernel engineers have known this distinction empirically for years. The kernel’s flag choices reflect accumulated knowledge about which category each UB falls into in practice. N3861 asks the standard to make that knowledge formal and permanent.

This pattern repeats. The two’s-complement mandate that C23 adopted, following N2412, ratified what compilers had been doing universally for decades before the standard acknowledged it. The C standard tends to document practice rather than lead it.

The Gap That Remains

Until C2Y closes the gap, programmers writing safety-critical C have limited options. They can use non-standard flags and accept that a conforming compiler without those flags may eliminate their safety checks. They can use unsigned arithmetic, which has defined wraparound semantics in C, for all numeric work where overflow is possible. They can use memcpy for type punning and hope the optimizer recognizes the pattern. They can instrument their builds with UBSan and treat trapping as acceptable deployment behavior, despite the sanitizer mode being formally outside the standard’s model.

What they cannot do, under the current standard, is write overflow-detection code using signed integers and trust that the detection will work when it is most needed. The standard does not give them that guarantee. The non-standard flag ecosystems do, at the cost of non-conformance and portability concerns.

C2Y’s timeline puts the next standard revision around 2028-2029. If erroneous behavior is adopted, it will be the first time the standard’s formal model aligns with the implicit model that UBSan has been running for over a decade. Reaching that alignment will not eliminate all of C’s undefined behavior; Annex J contains over two hundred entries, and no single standard cycle will work through all of them. But it would mean that the most commonly exploited UBs, the ones that have produced real CVEs and driven the flag ecosystems in major projects, have been given constrained semantics by the standard itself rather than by compiler extensions.

For C as a language for systems programming, that is the change worth watching.