How Emacs Fits an Entire Object System Into 64 Bits

Every value in Emacs Lisp, an integer, a string, a buffer, a cons cell, lives inside a single machine word. The entire type system collapses into 64 bits. Understanding how that works requires digging into src/lisp.h, and this walkthrough of Emacs internals makes a good entry point. The code it exposes is not arcane for its own sake; it is a clean example of a design problem that shows up across every dynamic language runtime: how do you encode type information without wasting a whole word per value?

The Fundamental Layout

Lisp_Object is the universal value type in Emacs Lisp. At runtime it is just an EMACS_INT, a machine word. The trick is that the lowest 3 bits are reserved as a type tag rather than part of the address or integer value. Emacs calls the number of tag bits GCTYPEBITS, and on all modern platforms it is 3, giving 8 possible top-level types:

enum Lisp_Type
{
    Lisp_Symbol       = 0,
    Lisp_Type_Unused0 = 1,   /* formerly Lisp_Misc, removed in Emacs 28 */
    Lisp_Int0         = 2,
    Lisp_Int1         = 6,
    Lisp_String       = 4,
    Lisp_Vectorlike   = 5,
    Lisp_Cons         = 3,
    Lisp_Float        = 7,
};

Fixnums use two tag values, Lisp_Int0 and Lisp_Int1, which together represent even and odd integers. This gives integers 62 usable bits on a 64-bit machine rather than 61. For heap-allocated types, the remaining 61 bits are a pointer, with the low 3 bits cleared by guaranteed alignment (heap structs are aligned to at least 8 bytes, so their lower 3 bits are always zero before tagging).

This scheme is called USE_LSB_TAG in the Emacs source, and it has been the default on 64-bit platforms since around Emacs 24. Earlier builds used the high bits for tags and masked the low bits to recover pointers; LSB tagging inverts that.

The Subtraction Trick

Extracting a pointer from a tagged word has two obvious approaches. You can mask off the tag bits:

ptr = word & ~7;   /* clear low 3 bits */

Or you can subtract the tag value:

ptr = word - tag;  /* e.g., word - 3 for Lisp_Cons */

Emacs uses the subtraction form, encapsulated in the XUNTAG macro:

#define XUNTAG(a, type, ctype) \
  ((ctype *) ((uintptr_t)(a) - (type)))

#define XCONS(a)   XUNTAG(a, Lisp_Cons,       struct Lisp_Cons)
#define XSTRING(a) XUNTAG(a, Lisp_String,     struct Lisp_String)
#define XVECTOR(a) XUNTAG(a, Lisp_Vectorlike, struct Lisp_Vector)

The difference matters on x86-64. A mask requires loading an immediate constant and executing an AND; a subtraction of a small constant compiles to a single LEA or SUB with an immediate. Both are one instruction in practice, but the subtract form often integrates more cleanly into addressing mode computations, and it makes the construction/destruction symmetry explicit: make_lisp_ptr adds the tag, XUNTAG subtracts it.

Poor Man’s Inheritance via Common Initial Sequence

Three bits gives you 8 type slots. Emacs has far more than 8 object types: buffers, windows, frames, processes, hash tables, compiled functions, character tables, terminals, threads, and more. All of these share a single tag value: Lisp_Vectorlike.

The disambiguation happens at the struct level. Every pseudovector, which is what Emacs calls the collection of types sharing Lisp_Vectorlike, begins with the same first field:

struct Lisp_Vector {
    union vectorlike_header header;
    Lisp_Object contents[FLEXIBLE_ARRAY_MEMBER];
};

struct buffer {
    union vectorlike_header header;  /* must be first, same layout */
    Lisp_Object name_;
    Lisp_Object filename_;
    /* ... dozens more Lisp_Object fields ... */
};

struct window {
    union vectorlike_header header;  /* same again */
    Lisp_Object frame;
    Lisp_Object mini;
    /* ... */
};

Because header is always the first field and always the same type, any pseudovector pointer can be safely cast to struct Lisp_Vector * to read the header. This is legal under C99 and C11 via the common-initial-sequence rule, which guarantees that members of a common initial sequence in structs within a union are interchangeable through any union member pointer.

The header.size field encodes a pvec_type enum in its upper bits:

enum pvec_type {
    PVEC_NORMAL_VECTOR,
    PVEC_BUFFER,
    PVEC_WINDOW,
    PVEC_FRAME,
    PVEC_PROCESS,
    PVEC_HASH_TABLE,
    PVEC_COMPILED,
    PVEC_THREAD,
    PVEC_MARKER,   /* moved here from Lisp_Misc in Emacs 28 */
    PVEC_OVERLAY,  /* same */
    /* ... ~20 more ... */
};

So type dispatch happens in two levels: the 3-bit Lisp_Type tag tells you whether you have a cons, a string, a symbol, a fixnum, or something vectorlike; if vectorlike, the pvec_type in the header tells you the specific object type. The check for a buffer looks like this:

#define BUFFERP(x)  PSEUDOVECTORP(x, PVEC_BUFFER)
#define PSEUDOVECTORP(x, code)                           \
  (VECTORLIKEP (x)                                       \
   && (((XVECTOR (x)->header.size                        \
         & (PSEUDOVECTOR_FLAG | PVEC_TYPE_MASK))         \
        == (PSEUDOVECTOR_FLAG | ((code) << PSEUDOVECTOR_AREA_BITS)))))

This is the pattern the article calls “poor man’s inheritance,” and it is a legitimate systems programming technique. You get polymorphic type checking without vtables, virtual dispatch, or C++. The cost is that you have to be disciplined about always keeping header as the first field in every new pseudovector type you add.

How This Compares to Other Runtimes

Emacs’s approach sits in a distinct position relative to the two main alternatives used by other dynamic runtimes.

NaN boxing, used by LuaJIT 2 and JavaScriptCore, encodes the entire value universe inside a 64-bit IEEE 754 double. When the exponent bits are all 1 and the mantissa is nonzero, the value is NaN, and the ~47 bits of mantissa payload are available to encode type tags and pointer or integer values. Every legitimate double is a real float value; every NaN encodes something else. The payoff is that floats are immediate, zero-allocation values. The cost is that pointers are limited to ~47 bits, which covers current user-space on x86-64 and ARM64 but leaves no room if the address space expands.

Emacs goes the other direction: pointers are full-width (61 bits after the tag), but every float in Emacs Lisp requires a heap allocation. Each floating-point literal allocates a struct Lisp_Float containing exactly one double field. If you write code that generates many floats in a hot loop, you are GC-pressuring yourself in a way that NaN-boxing runtimes avoid entirely.

SBCL’s two-level scheme (lowtag + widetag) is architecturally nearest to Emacs. SBCL uses 3 low bits as a lowtag distinguishing fixnums, cons pointers, function pointers, instance pointers, and other pointers. For heap objects, a widetag in the first word of the object encodes the specific type: bignum, ratio, double-float, simple-vector, symbol, and so on. That is structurally identical to Emacs’s Lisp_Type + pvec_type layering, and not by coincidence: both inherit from the same tradition of Common Lisp implementation on stock hardware.

V8 uses a different partition. Smis (small integers) set bit 0 to 0; HeapObject pointers set bit 0 to 1. Integers get 31 bits (32 bits on 64-bit with a trick where the integer occupies the upper 32 bits). Floats are heap-allocated HeapNumbers, same as Emacs, unless Turbofan unboxes them after profiling. V8’s pointer compression (since 2020) bounds the heap to a 4 GB cage and compresses pointers to 32 bits, recovering a full word per slot in arrays and objects.

Where the Pattern Comes From

Tagged Lisp objects are not a modern invention. The lineage runs to Lisp 1.5 on the IBM 704 (1960), where car and cdr literally extracted the Address Register and Decrement Register fields of a 36-bit machine word. There was no explicit tag; types were implicit in usage.

Explicit tagging emerged on the PDP-10. Maclisp at MIT used the high bits of 36-bit words for type information. The Lisp Machine (MIT, 1973) took the most aggressive approach: every word in memory carried a 4-bit hardware type tag in a parallel memory bank, and the CPU could dispatch on it in a single instruction. The Symbolics 3600 series refined this into a full tagged architecture. Dynamic dispatch was essentially free.

When Lisp moved to stock hardware (VAX, 68000, x86), software schemes replaced hardware tagging. The low-bit approach became standard because heap allocators naturally align objects to power-of-two boundaries, guaranteeing free bits at the bottom of every pointer. CMUCL and SBCL formalized the two-level scheme. Emacs converged on the same structure independently through decades of evolution.

The Shape of the Trade-off

Emacs’s representation is conservative and legible. The tag bits are in predictable positions, the pointer recovery is arithmetic, and the pseudovector hierarchy maps cleanly onto C struct layout rules. The GC can identify live pointers by checking a tag and then scan an object by reading the slot count from the header. Nothing requires runtime code generation or JIT support.

The cost shows up at the edges. Floats allocate. The fixnum range, while 62 bits, requires range checks when interfacing with C int or long. The pvec_type dispatch adds an extra memory read for every pseudovector type check.

For an editor that runs for decades and must be portable to every Unix variant and Windows, those trade-offs are reasonable. The implementation in lisp.h rewards reading: it is a clear example of fitting a full object model into a machine word with nothing exotic, just alignment, arithmetic, and sixty years of accumulated convention.