· 7 min read ·

The Rule About Uninitialized Memory That No Real Machine Follows

Source: lobsters

Reading an uninitialized local variable is undefined behavior in C. Most programmers know this, and most accept it without examining the justification closely. The justification is a historical artifact from 1989 hardware that no longer exists, applied to modern systems where the actual hardware behavior is completely predictable. WG14 paper N3861, “Ghosts and Demons: Undefined Behavior in C2Y,” addresses this directly with the concept of ghost values. It is not the most dramatic part of the C2Y standardization effort, but it is the part where the standard is most clearly wrong by current hardware reality.

Why Uninitialized Reads Are UB

The C standard defines uninitialized automatic variables as having “indeterminate value,” and it defines indeterminate value to include the possibility of a “trap representation”: a bit pattern that, when stored in an object of a particular type, causes the hardware to trap when the object is read. On this basis, reading an uninitialized variable could trigger a hardware trap, and the standard marks the behavior undefined to accommodate implementations where this is true.

Trap representations existed on real hardware in 1989. The PDP-11 had a trap representation for floating-point numbers; certain bit patterns in floating-point registers would cause an interrupt when loaded. Some architectures had trap representations for integer types as well, particularly for detecting use of uninitialized data. On those machines, reading a register that had never been written could, in principle, trigger a fault.

On every machine you are likely to run C code on today, none of this is true. x86-64 has no integer trap representations. ARM has no integer trap representations. RISC-V has no integer trap representations. Reading a stack slot that was never written produces whatever bits were left there by the previous call frame. The operation succeeds. The bits are some value. The result is stable for the duration of the access.

The standard still says this is undefined behavior. The trap representation justification is dead letter; the effect is a rule that exists for hardware that no longer matters.

The Practical Absurdity

The consequences are visible in code any systems programmer writes routinely.

char buf[4096];
ssize_t n = read(fd, buf, sizeof(buf));
if (n < 0) return -1;

// n bytes are initialized; buf[n..4095] are uninitialized.
// This is technically UB:
send(sock, buf, sizeof(buf), 0);

Sending the full buffer, including the uninitialized tail, is undefined behavior by the standard. The send call reads the uninitialized bytes; they are “used,” and indeterminate values have been passed to a function. The standard offers no guarantee about what happens next.

In practice, what happens is: send reads whatever bytes are in the stack memory, serializes them to the network, and the remote host receives them. This is what most programmers intend when they send a fixed-size buffer with a partially-filled payload; the receiver will parse n bytes of meaningful data and discard the rest. The hardware does the obvious thing. The standard says it is undefined.

Memory sanitizers surface this directly. MemorySanitizer tracks initialization state for every byte of memory and reports reads from uninitialized bytes as errors. That is the right behavior for detecting genuine bugs like branching on an uninitialized value. But it also fires on the send case above, where the behavior is predictable and the intent is clear. The sanitizer cannot distinguish between indeterminate values used in a decision and indeterminate values that are merely copied through.

Ghost Values as the Fix

N3861 introduces “ghost values” to model what uninitialized memory actually is on modern hardware. A ghost value has no concrete bit representation in the abstract machine; it is an indeterminate quantity. But ghost values have a specific, limited set of operations that are well-defined:

  • A ghost value may be copied. The copy is also a ghost value. This is what hardware does when you copy uninitialized memory: the bits travel, but their meaning remains indeterminate.
  • A ghost value may be passed to a function that does not branch on it. Passing uninitialized padding bytes to a serialization function is permitted.
  • Using a ghost value in a branch condition, a comparison, or an arithmetic expression where the result feeds into observable behavior is erroneous behavior: diagnosable, definite error, but not the full demon-class UB that licenses optimizer transformation.

The distinction between “diagnosable error” and “the optimizer may assume this never happened” is the key change. Under current C, reading an uninitialized variable allows the optimizer to conclude that the execution path is unreachable and eliminate surrounding code. Under the ghost value model, the optimizer cannot draw that conclusion; it must treat the path as live, even if the value in question is indeterminate.

The result looks like this:

int x;           // x is a ghost value
int copy = x;    // copy is also a ghost value; this is fine

if (copy > 0) {  // branching on a ghost value: erroneous behavior
    do_something();
}

memcpy(output, &x, sizeof(x));  // copying a ghost value: fine

The memcpy is permitted under the ghost value model. The branch is flagged as erroneous. This matches the intuition most experienced C programmers already apply: copying uninit bytes around is usually a bug to fix eventually, but it is not the kind of bug that destabilizes program execution the way a corrupted stack pointer does.

How LLVM Already Models This

This distinction corresponds exactly to what LLVM has maintained internally for years. LLVM IR represents two kinds of uncertain value: undef, which can take any value at each use but does not mark the path as unreachable, and poison, which propagates through computations and renders a control-flow decision using it unreachable.

The PLDI 2017 paper “Taming Undefined Behavior in LLVM” formalized this distinction and demonstrated that conflating the two produced unsound optimization. Subsequent work by Lee et al. at OOPSLA 2018 refined the model further, showing that even the undef/poison split had edge cases that required more careful specification. The central lesson from both papers: you cannot compress all indeterminate or invalid values into a single concept without creating contradictions in the optimizer.

N3861 is proposing to surface the same distinction at the C language level. Ghost values map to a refined version of LLVM’s undef: they can be copied, they can coexist with well-defined computation, and they become an error only when they influence observable output. The standard would be acknowledging what the compiler’s intermediate representation has been doing for a decade.

Cross-Language Parallels

The problem N3861 is solving for C was solved in Rust through a different mechanism. Rust’s MaybeUninit<T> type, stabilized in Rust 1.36, allows holding uninitialized memory as a value without reading it. The only way to get a T out of a MaybeUninit<T> is through an unsafe call to assume_init(), which makes the programmer explicitly assert that the memory has been initialized. Until that call, you can copy the MaybeUninit, pass it around, and write to it; branching on its contents is not permitted without going through unsafe.

This is structurally the ghost value model: there is a representation for “possibly uninitialized,” it supports copy and write operations, and the transition to “initialized” is explicit rather than implicit. Rust enforces this through its type system; C would enforce it through the abstract machine and sanitizer behavior.

C++26 is taking the same concept through the standards process. WG21 paper P2795, “Erroneous Behaviour”, would reclassify uninitialized reads of trivial types as “erroneous behavior” rather than undefined behavior. The implementation must produce some value and must not use the read to prove the path unreachable. N3861 explicitly aligns with this work; the two committees are coordinating so that C and C++ share a consistent model for indeterminate memory, which matters for the large amount of code that mixes both.

What Changes in Practice

If ghost values are adopted in C2Y, the send example from earlier becomes well-defined. Copying uninitialized padding bytes in a struct becomes well-defined. Using memcpy to serialize a partially-initialized buffer becomes well-defined. The standard would acknowledge what hardware already does.

What remains erroneous is any code that branches or computes based on an uninitialized value, and crucially, sanitizers could detect this case precisely. MemorySanitizer could fire on the branch and not on the copy. The signal-to-noise ratio improves; the category of error that remains flagged corresponds to code that is genuinely wrong rather than code that is merely informal.

The distinction also matters for security analysis. A ghost value in a branch creates an information disclosure risk if the uninitialized bytes contain residual stack data from a sensitive operation. That is a real bug worth finding. A ghost value passed as padding to a serialization function is a style issue at most. Giving tools the vocabulary to distinguish these cases is one of the concrete practical benefits of the taxonomy.

C2Y is not expected until 2028 at the earliest. N3861 is a design paper rather than normative text, and the path from design paper to standard wording requires follow-on papers, committee votes, and implementation experience. But the ghost value concept addresses a specific, long-standing mismatch between C’s abstract machine and the hardware that runs it. The 1989 justification has been dead for decades. It is worth giving the concept a name.

Was this interesting?