The C Runtime That Happens to Ship an Editor: Inside GNU Emacs's Lisp Engine
Source: lobsters
There is a framing shift that changes how you read Emacs source code: Emacs is not a text editor that happens to be extensible in Lisp. It is a Lisp runtime that ships with a text editor as its primary bundled application. This recent exploration of Emacs internals makes the point clearly, but the full picture is worth spelling out in detail, because the implementation choices downstream of that framing are genuinely interesting.
What the C Layer Actually Is
The GNU Emacs source tree contains roughly 300,000 lines of C in src/. That code does not implement an editor. It implements a Lisp runtime: a reader, an evaluator, a byte-code virtual machine, a memory allocator, a garbage collector, a symbol table, and a set of primitive operations that Lisp code calls into for I/O, process management, display, and system calls. The editor, with all its buffers, windows, keymaps, and modes, is implemented in approximately 1.5 million lines of Emacs Lisp that run on top of that C core.
The split is intentional and old. Richard Stallman designed GNU Emacs in 1984 around a C kernel hosting a Lisp interpreter, drawing on MACLISP and the earlier TECO-based Emacs implementations. The goal was an editor you could reprogram entirely at runtime without recompiling. The consequence is that the C code was always meant to be minimal: enough runtime to run Lisp well, not enough to implement editing itself.
Lisp_Object: The Foundation
Every value in the Emacs Lisp runtime is a Lisp_Object, defined in src/lisp.h. On a 64-bit system:
typedef EMACS_INT Lisp_Object;
This is just a machine word. Type information is encoded in the low-order bits using a technique called tagged pointers. With USE_LSB_TAG (the default on all modern platforms), the three lowest bits carry a type tag, leaving 61 bits for the payload.
The tag values map to the core types:
/* Simplified from src/lisp.h */
enum Lisp_Type {
Lisp_Symbol = 0,
Lisp_Type_Unused0 = 1,
Lisp_Cons = 2,
Lisp_Vectorlike = 3, /* vectors, hash tables, buffers, windows, frames */
Lisp_String = 4,
Lisp_Float = 5,
Lisp_Type_Unused1 = 6,
Lisp_Int0 = 7, /* fixnum: value in upper 61 bits */
};
Fixnums use two tags (Lisp_Int0 and Lisp_Int1, exploiting the sign bit) to store integers directly in the word without heap allocation. Everything else stores a tagged pointer to a heap-allocated C struct.
The heap structs are straightforward:
struct Lisp_Cons {
union {
struct { Lisp_Object car, cdr; } s;
/* ... alignment padding ... */
} u;
};
struct Lisp_Symbol {
bool_bf gcmarkbit : 1;
unsigned interned : 2;
unsigned trapped_write : 2;
/* ... */
union { Lisp_Object value; struct Lisp_Symbol *alias; } val;
Lisp_Object function;
Lisp_Object plist;
Lisp_Object name;
struct Lisp_Symbol *next; /* hash chain */
};
Vectorlike objects, which includes not just vectors but also hash tables, char-tables, subprocesses, frames, windows, and buffers, share a common header that encodes a pseudo-vector subtype in the high bits of a ptrdiff_t size field. This is how bufferp, windowp, framep, and similar predicates distinguish among objects that are all tagged Lisp_Vectorlike at the word level.
The Evaluator
src/eval.c is the heart of the runtime. The main entry point is Feval:
Lisp_Object
Feval (Lisp_Object form, Lisp_Object lexical)
The dispatch logic is simple. If form is a symbol, look up its binding in the current environment (checking lexical scope first if lexical-binding is enabled, then the dynamic binding stack). If it is not a cons cell, return it unchanged, because numbers, strings, vectors, and nil are self-evaluating. If it is a cons, treat the car as the function position and evaluate accordingly.
Function calls go through Ffuncall. The function object can be:
- A subr: a C function pointer wrapped in a
Lisp_Subrstruct. Called directly after argument evaluation. - An interpreted closure: a
(closure ENV ARGS BODY...)list.apply_lambdabinds the argument list and evaluates the body. - A byte-compiled function: a
Lisp_Vectorlikewith subtypePVEC_COMPILED. Dispatched to the byte-code interpreter insrc/bytecode.c. - A native-compiled function: a
Lisp_Vectorlikewith subtypePVEC_NATIVE_COMP_UNIT. Dispatched directly to machine code (Emacs 28+).
Special forms, things like if, let, let*, while, progn, quote, and setq, are handled as subrs with UNEVALLED argument conventions, meaning they receive their argument list unevaluated and handle evaluation themselves.
The dynamic binding stack is called the specpdl. It is a C array of union specbinding entries that grows as let forms bind variables and as unwind-protect and condition-case push handlers. Unwinding on non-local exit is done by unbind_to, which walks the specpdl backward, restoring old variable values and running cleanup thunks. This is the mechanism behind nearly all of Emacs’s error handling and resource cleanup semantics.
Garbage Collection
The garbage collector lives in src/alloc.c and uses a stop-the-world mark-and-sweep algorithm. It has not changed fundamentally in decades, which is both a stability story and a latency story.
Allocation is done from type-specific pools. Conses, floats, and symbols are allocated from large fixed-size blocks; a free list threads through the unused cells. Vectors are allocated individually with malloc and organized into size-bucketed free lists and a separate large_vectors linked list. Strings use a sblock scheme where string data is packed into large heap segments.
Mark phase starts from the GC roots: the obarray symbol table, all stack-allocated Lisp_Object values visible to C, live specpdl entries, and several global variables. Older Emacs code used explicit GCPRO/UNGCPRO macros to register stack roots; Emacs 27 removed them in favor of conservative stack scanning, where the GC walks the C stack looking for values that look like tagged Emacs pointers.
mark_object traverses each reachable object, setting a mark bit in the object’s header or in a per-block bitmap, and recursing into any Lisp_Object fields it contains.
Sweep phase walks all allocated blocks and returns unmarked cells to their free lists. Unmarked strings have their data freed from sblocks. Unmarked vectors are reclaimed and added back to the bucketed free lists.
The GC is triggered when bytes_since_gc exceeds gc-cons-threshold, a user-visible variable defaulting to 800KB in older releases and 100MB in Emacs 30. The non-moving design means pointers are never relocated, which simplifies the interaction between C code and heap objects but limits the collector’s ability to reduce fragmentation.
For comparison, GNU Guile, the other major Scheme implementation designed for embedding in C applications, moved to a BDW (Boehm-Demers-Weiser) conservative collector and later to a precise, copying collector built on Whippet. SBCL uses a generational copying GC with separate nursery and tenured spaces. Emacs’s collector is simpler and more predictable but pays in pause times on large heaps.
The Byte-Code VM and Native Compilation
Emacs Lisp has a byte-compiler that transforms source into a compact bytecode format stored in Lisp_Vectorlike objects with subtype PVEC_COMPILED. The bytecode interpreter in src/bytecode.c is a straightforward stack machine. Compiled .elc files load much faster than source and execute several times faster, largely by avoiding repeated symbol lookups.
Emacs 28 added native compilation via libgccjit, contributed primarily by Andrea Corallo. The native compiler, called comp.el, translates Emacs Lisp through an internal SSA-based IR (called LIMPLE) down to C-level code that GCC JITs to native machine instructions. The result is stored as .eln files. CPU-bound Emacs Lisp code sees 2x to 10x speedups under native compilation, though startup time and memory for libgccjit are non-trivial costs.
The architecture here is layered: source Lisp is interpreted by Feval, compiled Lisp runs on the bytecode VM, and natively compiled Lisp calls directly into machine code, all through the same Ffuncall dispatch path. Switching between these modes is transparent to Lisp code.
How This Differs from Other Lisp Runtimes
Emacs Lisp is a Lisp-1: functions and variables share a single namespace. You can pass a function as 'my-function or #'my-function, and both work because there is only one slot per symbol. Common Lisp is a Lisp-2 with separate function and value cells; you must use #' to obtain a function object for passing as an argument, and funcall or apply to call it. This is not a minor syntactic difference; it reflects fundamentally different ideas about whether functions should be first-class values in the ordinary sense.
Emacs Lisp was dynamically scoped by default from the beginning. Lexical binding was added as an opt-in per-file setting in Emacs 24 (2012), via the ;;; -*- lexical-binding: t -*- file variable. Modern Emacs Lisp code uses lexical binding, which enables closures and allows the compiler to generate better code, but the default remains dynamic for backward compatibility. Scheme and modern Common Lisp are lexically scoped by default.
Emacs Lisp has no tail-call optimization. Deep recursion eventually hits max-lisp-eval-depth (default 1600) and signals an error. Scheme requires proper tail recursion by specification. SBCL implements it. Emacs does not, and the call stack corresponds directly to C stack frames in Feval and Ffuncall.
The condition system is condition-case and signal rather than the full Common Lisp restart system. There is no equivalent of invoke-restart; once you are in a handler, the stack between the signal and the handler has already been unwound. This simplifies the implementation but removes the ability to recover and resume execution from the point of an error.
Why the Framing Matters
Understanding Emacs as a Lisp runtime changes what you look for when reading its code. The 300K lines of C are not the editor; they are the substrate. When you write an Emacs package, you are writing an application for a specific Lisp runtime that happens to already include a sophisticated text editing application in its standard library. The C code in src/ is not the interesting part of Emacs the editor; it is the interesting part of Emacs the runtime.
This also explains some of Emacs’s unusual properties as software: why it can be extended so deeply at runtime, why M-x eval-expression is so powerful, why you can redefine core editor behavior without restarting the process. The C layer is intentionally minimal and stable. The Lisp layer is intentionally malleable.
Emacs 29 continued this trajectory by shipping tree-sitter integration and expanding the C primitives for JSON and SQLite. Each of these additions follows the same pattern: a small C primitive exposed as a Lisp-callable function, with higher-level behavior implemented in Lisp on top.
For anyone curious about language runtime implementation, the Emacs source is worth reading. It is not the most modern design, and it predates a lot of what we now know about GC and JIT compilation. But it is a working, heavily-used Lisp runtime with 40 years of production history, and the distance between Feval in src/eval.c and the editor you see on screen is mostly just Lisp all the way down.