· 7 min read ·

The Lisp Runtime That Grew an Editor: Inside GNU Emacs's C Core

Source: lobsters

Emacs is commonly understood as a text editor extended by Lisp. The source code shows something more precise: a Lisp runtime implemented in C, whose primary application happens to be a text editor. A recent post in the Emacs Internals series opens with this framing, and it is worth following through on what it means at the implementation level.

Two layers, one executable

The GNU Emacs source tree has two significant directories: src/ (~150,000 lines of C) and lisp/ (~1.5 million lines of Emacs Lisp). The C layer implements a Lisp interpreter: memory allocation, garbage collection, a reader and printer, a bytecode VM, roughly 1,500 primitive functions, and OS interfaces. The Lisp layer implements everything else: every major mode, the package manager, Org-mode, the minibuffer, Magit, Gnus, Eshell.

The design principle is that anything requiring OS access or performance goes in C, and everything else goes in Lisp. This is the same division found in Guile, Racket, and early Lisp machine operating systems. The editor is the application; the runtime is the platform.

Lisp_Object: everything is a tagged word

The central data structure in src/lisp.h is Lisp_Object, defined as intptr_t, a signed integer the width of a pointer. Every Lisp value, regardless of type, passes through the system as one of these words. The low bits encode the type, exploiting the fact that heap-allocated objects are aligned and have natural zero bits at the bottom.

On a 64-bit system, a fixnum (small integer) is stored directly in the word, shifted left to leave room for the tag bits. No allocation happens; arithmetic on fixnums is arithmetic on machine integers. Cons cells, strings, symbols, and vectors are heap-allocated; their Lisp_Object representations are tagged pointers, with the tag bits stripped before dereferencing.

/* Extracting values from a Lisp_Object */
#define XFIXNUM(a)  ((a) >> INTTYPEBITS)              /* fixnum: shift out tag */
#define XCONS(a)    ((struct Lisp_Cons *) XPNTR(a))   /* heap: strip tag, cast */
#define XSTRING(a)  ((struct Lisp_String *) XPNTR(a))
#define XSYMBOL(a)  ((struct Lisp_Symbol *) XPNTR(a))

/* Type predicates */
#define CONSP(x)    (XTYPE(x) == Lisp_Cons)
#define SYMBOLP(x)  (XTYPE(x) == Lisp_Symbol)
#define FIXNUMP(x)  /* tests tag bits for fixnum encoding */

The symbol struct layout reflects the multiple roles a symbol can play:

struct Lisp_Symbol {
    union {
        Lisp_Object value;                       /* regular variable */
        struct Lisp_Symbol *alias;               /* variable alias */
        struct Lisp_Buffer_Local_Value *blv;     /* buffer-local */
        union Lisp_Fwd *fwd;                     /* forward to C variable */
    } val;
    Lisp_Object function;
    Lisp_Object plist;
    Lisp_Object name;
    struct Lisp_Symbol *next;   /* obarray hash chain */
};

That fwd union member is how gc-cons-threshold, fill-column, and hundreds of other Lisp-visible settings are actually C int or Lisp_Object variables in the C source, exposed to Lisp transparently. Reading or writing one of these variables from Lisp dispatches through a forwarding pointer directly to the C global, with no intermediate storage.

DEFUN: the bridge between C and Lisp

The mechanism that makes C functions callable from Lisp is the DEFUN macro in lisp.h. It takes a Lisp name, a C function name, a struct name, argument count bounds, an interactive specification, and a docstring, and expands into both a C function definition and a Lisp_Subr struct:

DEFUN ("cons", Fcons, Scons, 2, 2, 0,
       doc: /* Create a new cons cell with CAR and CDR as components. */)
  (Lisp_Object car, Lisp_Object cdr)
{
    /* allocate cons cell, set car and cdr, return it */
}

During startup, each C module calls a syms_of_*() function that passes the Scons struct to defsubr(), which interns the symbol cons into the global obarray and sets its function cell to the subr. From that point on, (cons 1 2) in Lisp dispatches directly to Fcons in C.

Special forms like if, let, and and use the UNEVALLED argument specification, meaning they receive their argument list as a raw unevaluated cons chain and handle evaluation order themselves:

DEFUN ("if", Fif, Sif, 2, UNEVALLED, 0, doc: /* ... */)
  (Lisp_Object args)
{
    Lisp_Object cond = eval_sub (XCAR (args));
    if (!NILP (cond))
        return eval_sub (Fcar (XCDR (args)));
    else
        return Fprogn (XCDR (XCDR (args)));
}

This is the complete implementation of if. The evaluator (eval_sub in src/eval.c) dispatches to this C function when it sees a list headed by the if symbol, and the C function decides what to evaluate. All of the roughly 25 special forms in Emacs Lisp are implemented this way.

Bootstrap: from temacs to a dumped image

The build process is itself a Lisp bootstrapping exercise. Compiling the C sources produces temacs, a bare Lisp interpreter with no standard library loaded. It has the primitives registered by DEFUN but nothing else; it cannot run as an editor.

The build then invokes:

temacs --batch --load loadup.el

loadup.el (in lisp/) loads the Emacs Lisp standard library in careful dependency order. subr.el comes first because it defines defun, defmacro, let, when, unless, and the caar/cadr family; everything else depends on these. After subr.el, roughly a hundred more files load in sequence, building up the complete editor environment.

Since Emacs 27, after loadup.el finishes, the entire Lisp heap is serialized to an emacs.pdmp file using the portable dumper (src/pdumper.c). Before that, the mechanism was unexec, which literally copied the process memory into a new executable binary. It was deeply system-specific, requiring different code for every OS and linker combination, and was one of the longest-running portability headaches in the codebase.

When you launch emacs, the runtime maps emacs.pdmp back into memory with mmap and relocates pointers. The full standard library is available immediately without re-evaluating a line of Lisp. Startup time is dominated by loading user configuration and the dump file, not by parsing the standard library.

The garbage collector

Emacs uses a mark-and-sweep collector, not a generational or copying GC. Different object types use different allocation pools: cons cells come from cons_blocks (1023 cells per block, blocks linked in a chain), symbols from symbol_blocks, strings from string_blocks with separately managed character data buffers, and vectors from a free-list organized by size class.

The mark phase walks all GC roots (the C stack, all symbol values, all buffer contents, the specpdl binding stack) and marks reachable objects. The sweep phase reclaims unmarked objects to free lists. String character data has a separate compaction step.

The lack of generations means every GC cycle is a full-heap scan. For long-running Emacs sessions with large heaps, this produces occasional multi-millisecond pauses. The Evolution of Emacs Lisp paper (Monnier and Sperber, HOPL 2020) notes this as one of the persistent runtime limitations. Users work around it by raising gc-cons-threshold during startup to defer collection until the heap is fully loaded:

(setq gc-cons-threshold most-positive-fixnum) ; defer GC during init
;; ... load packages ...
(setq gc-cons-threshold 16777216)             ; restore to 16MB after

This technique is unnecessary in systems with generational collectors, where short-lived allocations are collected cheaply in a young-generation sweep without touching the rest of the heap.

Dynamic binding as a design artifact

Emacs Lisp uses dynamic binding by default, where variable lookup walks a runtime binding stack rather than a lexical closure. The implementation is the specpdl, a stack of {symbol, old_value} pairs:

void specbind (Lisp_Object symbol, Lisp_Object value) {
    specpdl_ptr->symbol    = symbol;
    specpdl_ptr->old_value = SYMBOL_VALUE (symbol);
    specpdl_ptr++;
    SET_SYMBOL_VALUE (symbol, value);
}

let pushes bindings; unbind_to restores them on exit, including non-local exits through unwind-protect. This is the natural implementation given a simple stack-based interpreter with no closures. Lexical binding was added in Emacs 24 via ;;; -*- lexical-binding: t -*- and works by having the bytecode compiler capture free variables in closure slots rather than relying on the specpdl. The two modes coexist in the same runtime, which is why you still see older packages relying on dynamic binding for intentional dynamic dispatch.

The absence of an API boundary

Emacs extensions run inside the same Lisp runtime as the editor. There is no extension host process, no JSON-RPC protocol, no sandbox. VS Code extensions run in a separate Node.js process and communicate with the editor over a defined protocol; they cannot touch the editor’s internal data structures directly. An Emacs package can redefine self-insert-command, patch gc-cons-threshold, walk the display list, or replace the bytecode evaluator. Nothing prevents it.

This is not primarily a security posture; it is a design philosophy carried forward from Lisp machines, where the distinction between the programming environment and the running application was intentionally blurred. The cost is that a buggy package can corrupt editor state in ways that are difficult to diagnose. The benefit is that there is no capability that falls outside the extension model, because the extension model is the full language. This is why things like Magit and Org-mode can exist as packages: they are not constrained by an API surface, so they can reach as deep into the editor’s data model as they need to.

Where it goes from here

Emacs 28 added a native compiler (src/comp.c) that uses libgccjit to compile Emacs Lisp through a bytecode IR and SSA form down to native machine code, stored as .eln files. For compute-intensive Lisp, this produces 2-10x speedups over bytecode. The GC remains unchanged, which means GC pressure is unchanged; the native compiler helps CPU-bound work but does not address pause latency.

The Guile Emacs project has proposed replacing Emacs Lisp with Guile Scheme for decades, which would bring tail call optimization, first-class continuations, and a generational GC. The project has never merged to mainline, largely because the existing Emacs Lisp codebase is enormous, the compatibility surface is vast, and Guile Emacs has historically been significantly slower on real workloads despite the theoretical advantages of the runtime.

So the runtime described in src/lisp.h, tag bits and all, continues to be what runs under every emacs process in the world. Understanding it as a Lisp runtime rather than a scriptable editor changes what questions are worth asking: not “why is this editor so strange” but “what does this Lisp machine choose to run as its main application, and why.”

Was this interesting?