· 7 min read ·

NaN Boxing, Smi Tags, and Why Emacs Made the Choice It Did

Source: lobsters

The Emacs Internals series walks through Emacs’s tagged pointer scheme methodically: three low-order bits encode the type, heap pointers fill the upper bits, and a two-level system handles the overflow of types beyond what three bits can represent. This is a solid introduction to one specific approach. What the article does not address is that this approach is one of at least three fundamentally distinct ways to solve the same problem, and the trade-offs between them explain a lot about why JavaScript engines, Lisp runtimes, and general-purpose language VMs ended up looking so different from each other.

The Problem

Every dynamically typed language runtime must answer the same question: given a 64-bit machine word, how do you know what type of value it contains? Storing the type in a separate field costs memory and a second load per operation. Fitting type information into the value word itself is the goal, but there are three significantly different ways to accomplish it, and each one makes a different bet about what the runtime will be doing most of the time.

LSB Tagging: Emacs and SBCL

Emacs reserves the three low-order bits of every Lisp_Object for a type tag:

enum Lisp_Type {
    Lisp_Symbol       = 0,
    Lisp_Int0         = 2,
    Lisp_Cons         = 3,
    Lisp_String       = 4,
    Lisp_Vectorlike   = 5,
    Lisp_Int1         = 6,
    Lisp_Float        = 7,
};

This works because 64-bit platforms align heap allocations to at least 8 bytes, leaving the low 3 bits of every valid pointer always zero before the tag is added. Integers use two of the eight tag slots to encode even and odd values, yielding 62 usable bits for fixnums. Floats use the remaining slot, but the tag value points to a heap-allocated Lisp_Float struct rather than encoding the float directly.

Pointer recovery subtracts the tag constant rather than masking it off:

#define XUNTAG(a, type, ctype) ((ctype *) ((uintptr_t)(a) - (type)))
#define XCONS(a)   XUNTAG(a, Lisp_Cons,   struct Lisp_Cons)
#define XSTRING(a) XUNTAG(a, Lisp_String, struct Lisp_String)

Subtraction compiles identically to masking on x86-64 (both produce a single address computation instruction), but makes the encode-decode relationship explicit: make_lisp_ptr adds the tag, XUNTAG subtracts it.

The central cost of LSB tagging is float allocation. A floating-point literal in Emacs Lisp allocates a heap object. The tag space is too narrow to hold a 64-bit double directly, so floats become GC-managed objects. For a text editor, this is acceptable. For a numerical computing runtime, it would be a serious problem.

SBCL uses the same approach with a lowtag for pointer classification and a secondary widetag byte in the object header for subtype dispatch, following the same intellectual lineage through decades of Common Lisp implementation work on stock hardware.

NaN Boxing: JavaScriptCore and LuaJIT

IEEE 754 defines a quiet NaN as any 64-bit value where the exponent bits are all ones and the quiet bit (bit 51) is set. That leaves 51 bits of NaN payload that a conforming implementation can use freely. JavaScriptCore, LuaJIT, and SpiderMonkey at various points have exploited this to encode types and values without any allocation.

The scheme works as follows: a genuine double-precision float is stored as-is, occupying its natural IEEE 754 bit pattern. A non-float value is stored as a quiet NaN with the type encoded in some payload bits and the actual value or heap pointer in the remainder:

IEEE 754 quiet NaN layout:
[sign:1][exponent:11 = all 1s][quiet:1][payload:51]

JavaScriptCore encoding (simplified):
  Double:       standard IEEE 754 (not a canonical NaN)
  Integer:      NaN bits | type tag | 32-bit value in low bits
  Heap pointer: NaN bits | type tag | 47-bit pointer

The payoff is significant: floats are entirely free. No allocation, no GC tracing, no heap pressure. A program that does heavy floating-point arithmetic works on native doubles without touching the allocator. For JavaScript, where 1.5 + 2.5 is the norm and floats are the default number type, this is a substantial win.

The constraint is equally significant: pointers are limited to the payload space after reserving the NaN and type tag bits, which leaves roughly 47-48 bits for addresses. On current x86-64 Linux and macOS, user-space virtual addresses fit in 47 bits, so the scheme works. On platforms with larger address spaces it fails.

Apple Silicon in some configurations supports 57-bit virtual addresses. Pointer authentication on ARM64 consumes tag bits that would otherwise be available for NaN payload tricks. Any platform that pushes beyond 47 bits of user-space addressing breaks NaN boxing unless you sacrifice type tag bits, which reduces the number of encodable types.

LuaJIT’s Mike Pall documented the trade-off directly when describing the encoding: it assumes a 47-bit address space ceiling and makes that assumption a hard dependency of the representation.

V8: The Smi Compromise

V8 uses a single bit rather than three, distinguishing exactly two universes: Small Integer (Smi) and HeapObject.

Smi:        [31-bit integer value][1]   -- low bit is 1
HeapObject: [pointer address...  ][0]   -- low bit is 0

Type information for heap objects comes from the object’s hidden class (map), a separate metadata object that describes each object’s shape and type. This pushes type dispatch to a secondary lookup rather than encoding it directly in the value word.

Since 2020, V8 has added pointer compression: heap pointers are stored as 32-bit offsets into a 4 GB managed heap cage rather than full 64-bit addresses. This reduces heap memory consumption for typical JavaScript workloads by roughly 40%, at the cost of limiting the managed heap to 4 GB per isolate. The two techniques compose: compressed pointers still use the low bit for the Smi flag, and the 31-bit offset into the cage fits in the upper bits.

Floats in V8 are heap-allocated HeapNumber objects for the general case. Turbofan, V8’s optimizing compiler, unboxes floats to native registers within compiled hot paths, but at language boundaries and in polymorphic code, floats allocate.

OCaml’s Minimal Approach

OCaml uses exactly one bit. Bit 0 set means integer immediate; bit 0 clear means heap pointer. Block types live in a tag byte in the block header. The static type system handles most dispatch at compile time, so the runtime tag is primarily for the garbage collector: it needs to know which words in a block are pointers to trace, and the single bit answers that question without any further lookup.

OCaml integers are therefore 63-bit. The overhead is minimal and the scheme is maximally simple. Floats are heap-allocated by default, but the compiler applies unboxing extensively in typed contexts. A float array stores unboxed doubles. A polymorphic container boxes them.

Why Emacs Could Not Have Used NaN Boxing

Historically, NaN boxing postdates Emacs’s core design. The representation in lisp.h predates IEEE 754’s widespread adoption and the later discovery that NaN payloads were exploitable for type encoding. But setting history aside, there are structural reasons why NaN boxing would be wrong for Emacs even today.

Emacs’s garbage collector is precise and its core mark loop processes every Lisp_Object field in every live object. Identifying whether a word is a pointer requires a fast test. The LSB test is two instructions: mask the low 3 bits, compare against the set of pointer tag values. The NaN boxing test requires comparing against a 13-bit exponent mask plus the quiet bit, then checking payload bits for the specific type. For a GC that examines thousands of words per collection cycle, that difference accumulates.

More importantly, Emacs’s uniform GC traversal loop depends on the slot count stored in vectorlike_header.size and on the guarantee that all Lisp_Object fields in a pseudovector are contiguous and come before any raw C data. That invariant does not interact well with NaN boxing’s assumption that every value word, regardless of position in the object, can be decoded by the same NaN test. The two designs have different assumptions about what a “value word” means in context.

Where the Three Designs Converge

All three approaches exist because 64 bits is simultaneously large enough to hold any useful value and too narrow to hold a value plus a separate type tag. The tag has to fit in the same word as the value or the pointer, and each design makes a different bet about what to optimize.

LSB tagging with 3 bits wins when GC correctness, portability across address space sizes, and simplicity of implementation matter more than float allocation cost. NaN boxing wins when the language’s dominant numeric type is floating-point and the target platforms can guarantee a 47-bit address ceiling. Smi encoding paired with pointer compression wins when heap memory density is the primary concern for a modern runtime with a JIT that can unbox most floats away in hot code.

Emacs’s choice was appropriate in the early 1980s, when 32-bit pointers made the trade-off space entirely different, and it remains appropriate now for a precise GC over a mixed-type Lisp heap with a growing inventory of pseudovector types. SBCL independently converged on the same structure through a separate line of Common Lisp implementation work, which suggests the design sits at a stable point in the trade-off space even if it is not the only one.

Was this interesting?