Tagged Pointers, Poor Man's Inheritance, and What C Can't Say

When you read through the walkthrough of Emacs’s tagged pointer scheme, the patterns feel familiar if you have spent time in Rust or Zig. The three techniques described, tagged pointers, tagged unions, and struct-embedding inheritance, map directly onto language features that Rust and Zig make first-class. Understanding the mapping clarifies what the C version is doing and what it is giving up.

The Shape of the Problem

Every dynamic language runtime faces the same core problem: a function that accepts a Lisp value might receive an integer, a string, a cons cell, or a buffer, and at runtime the callee needs to know which one it has. C has no built-in solution for this. C++ adds virtual dispatch and vtables. Rust adds generics and trait objects. C leaves you with three tools: tags encoded in pointer bits, tags stored in union members, and the pointer-aliasing rules that govern struct casting. Emacs uses all three in concert.

Tagged Pointers and Rust’s Niche Optimization

Every Lisp value in Emacs is a Lisp_Object, defined as an intptr_t. The bottom 3 bits encode the type. This is the USE_LSB_TAG scheme, documented in src/lisp.h, which has been the default on 64-bit platforms since around Emacs 24. Eight type categories fit in 3 bits: fixnums, symbols, strings, cons cells, vectorlike objects, and floats occupy most of them. Fixnums use two tags (Lisp_Int0 and Lisp_Int1) to push their usable range to 62 bits on a 64-bit host.

The equivalent structure in Rust would be an enum with a numeric discriminant, except that Rust cannot bit-pack a discriminant into a pointer the way Emacs does. What Rust provides for pointer-carrying types is niche optimization: if a type has a value that is guaranteed never to be a valid instance, the compiler uses that niche to encode the discriminant of a wrapping enum. The canonical example is Option<NonNull<T>>, which compiles to a single word, using null as the None representation. Emacs’s two integer tags are a manual version of the same idea: the programmer maintains the invariant through disciplined use of macros rather than through compiler enforcement.

Every access to a Lisp_Object goes through macros like XCONS, XSTRING, and XSYMBOL, which subtract the type tag and cast the resulting integer to a typed pointer. The caller must already know the object’s type before calling the macro. Passing a symbol to XCONS produces a pointer to whatever memory happens to lie at the symbol’s address. There is no static check.

In Rust, matching on an enum variant is exhaustive unless explicitly suppressed with a wildcard arm. If you write code that should only handle cons cells, the compiler enforces handling or suppression of every other variant. The Emacs equivalent is manually written if (!CONSP(x)) signal_error(...) guards at every function boundary where the type matters.

Tagged Unions: The Lisp_Misc Case

When 3 bits give you 8 categories but you need more than eight object types, you have two options: widen the primary tag or encode a secondary discriminant inside the allocated object. Emacs historically used both.

The Lisp_Misc tag covered markers, overlays, save-values, and finalizers. Each pointed to a union Lisp_Misc, whose first member was struct Lisp_Misc_Any:

struct Lisp_Misc_Any {
  ENUM_BF (Lisp_Misc_Type) type : 16;
  bool gcmarkbit : 1;
  unsigned spacer : 15;
};

The dispatch ran in two levels. The tag bits in the Lisp_Object word identified the object as Lisp_Misc, then the type field in Lisp_Misc_Any identified the specific subtype. Every type check for a marker or overlay cost two memory accesses: one to read the object pointer from the Lisp_Object, one to read the subtype discriminant from the object header.

Zig handles this natively with union(enum), a union tagged by an enum discriminant:

const MiscValue = union(enum) {
    marker: *Marker,
    overlay: *Overlay,
    save_value: SaveValue,
    finalizer: *Finalizer,
};

Switching on a MiscValue is exhaustive by default. Adding a new variant and omitting it from a switch statement is a compile error. In Emacs C, adding a Lisp_Misc_Type variant requires auditing every switch (XMISCTYPE(x)) call site by hand, across a codebase with hundreds of files.

This fragility accumulated over decades. The Lisp_Misc type tag was eliminated in Emacs 28, where the slot was renamed Lisp_Type_Unused0 in the enum. Markers and overlays moved into the vectorlike hierarchy. The change removed a separate allocator code path, simplified GC traversal by eliminating the Lisp_Misc case entirely, and reduced the number of places in the codebase where the complete set of misc subtypes needed to be kept consistent.

The lesson is not specific to Emacs. Any catch-all category in a manually maintained dispatch system accumulates the same debt: the type grows as new subtypes get added, every existing switch statement needs updating, and the compiler cannot tell you what you missed.

Poor Man’s Inheritance and the First-Member Cast

The vectorlike hierarchy, where buffers, windows, frames, processes, hash tables, compiled functions, and thread objects all live, is where the third pattern operates. Every pseudovector begins with a vectorlike_header:

struct buffer {
  union vectorlike_header header;  /* must be first */
  Lisp_Object name_;
  Lisp_Object filename_;
  /* ... */
};

The C standard (C11 §6.7.2.1p15) guarantees that a pointer to a struct and a pointer to its first member have identical representations after conversion. Any pseudovector can be safely cast to struct Lisp_Vector * to read the header without knowing the object’s concrete type. The upper bits of header.size encode the pvec_type discriminant: PVEC_BUFFER, PVEC_WINDOW, PVEC_FRAME, and about twenty others. The lower bits encode the count of Lisp_Object slots, which the garbage collector uses to scan each object’s live references without a type-specific traversal function.

The Rust equivalent for this kind of polymorphic type dispatch is a trait object (dyn Trait). A trait object carries two pointers: one to the data and one to a vtable containing method pointers, the type’s size, and its drop function. Reading a pvec_type in Emacs costs one memory access at a fixed offset. Following a trait object dispatch path costs two: load the vtable pointer from the fat pointer, then index into the vtable. For Emacs’s hot type-checking predicates, bufferp, windowp, framep, the struct-embedding approach avoids that second memory access entirely.

C++ has CRTP (Curiously Recurring Template Pattern) as a compile-time alternative: a base template parameterized by the derived type generates specialized code for each concrete class without vtable indirection. CRTP eliminates runtime dispatch overhead but requires that the concrete type be known at compile time. Emacs cannot use this because the type of any Lisp_Object is a runtime property, and because C does not have templates.

What the Three Patterns Add Up To

The patterns described in the Emacs internals series compose into a complete runtime type system built from C primitives: pointer alignment guarantees, bitfield arithmetic, and struct layout rules. Tagged pointers handle coarse type identification with no allocation overhead per value. Tagged unions allow secondary dispatch within a category, at the cost of manual exhaustiveness. Struct-embedding provides shared generic access to heterogeneous objects without vtable indirection, at the cost of a strict first-member convention maintained across hundreds of struct definitions.

Rust and Zig provide each of these as first-class features: comptime niche optimization for pointer types, exhaustive union(enum) for tagged unions, and trait objects or comptime interfaces for polymorphic dispatch. The C versions achieve the same runtime behavior through programmer convention and code review discipline rather than compiler enforcement.

The difference shows up in what happens when the conventions break. A pvec_type header placed in the wrong struct field, a new Lisp_Misc_Type added without updating every switch, a caller that forgets to type-check before calling XCONS: these are runtime bugs in C and compile errors in Rust or Zig. The patterns are sound; the problem is that C has no mechanism to verify the soundness at build time.

For a runtime that predates C99, runs on every major platform, and treats GC traversal cost as a real constraint, the design is coherent and appropriate. What tracing through this codebase makes clear is that the tagged pointer, the tagged union, and the first-member cast are not workarounds for C’s limitations. They are a considered design, built at a time when the language features that would make them safer did not exist yet.