Emacs Is a C Runtime That Happens to Ship an Editor

Most people encounter Emacs as a text editor with unusual configurability. The more accurate description, as explored in a recent deep-dive into Emacs internals, is that Emacs is a Lisp runtime written in C, and the editing environment is an application running inside that runtime. The distinction matters, because it explains design decisions that otherwise look like historical accidents.

The Two Executables

When you build GNU Emacs from source, the build system produces two different executables at different stages. The first is temacs, sometimes called the “bare impure” Emacs. It is a C binary that contains the Lisp interpreter, memory allocator, garbage collector, and a set of C-implemented primitive functions, but no Lisp code has been loaded into it yet. It can evaluate Lisp expressions, but it has no buffers, no windows, no keyboard handling, and no knowledge of what a text editor is.

The second stage loads Lisp. A file called lisp/loadup.el is evaluated by temacs, which incrementally loads the standard library, the buffer management code, the display engine bindings, the keymaps, and eventually the full editing environment. The result is then serialized to disk.

Historically this serialization used a technique called unexec, which took a snapshot of the running process’s memory and wrote it as a new ELF binary. That approach was fragile and platform-specific. Emacs 27 replaced it with pdump (portable dump), which serializes the Lisp heap to a .pdmp file that is memory-mapped at startup. The executable you run is still effectively a Lisp image checkpoint, loaded into a C runtime on demand.

Lisp_Object and Tagged Pointers

The entire runtime is built around a single type: Lisp_Object. In the 64-bit build, this is a signed 64-bit integer. The low 3 bits serve as a type tag. Because heap allocations are aligned to at least 8 bytes, those 3 bits in any real pointer are always zero, which means the runtime can steal them for type discrimination without losing information.

/* From src/lisp.h, simplified */
typedef intptr_t Lisp_Object;

#define LISP_INT_TAG      0  /* fixnum: value is shifted left by 1 */
#define LISP_SYMBOL_TAG   2
#define LISP_STRING_TAG   4
#define LISP_CONS_TAG     6
/* vectorlikes share tag 5, sub-typed by pvec_type in header */

Fixnums (small integers) are encoded directly in the value: the integer is shifted left by one bit, and the tag is zero, so no heap allocation is needed for integers within the 63-bit range. Bignums, added in Emacs 27, use a heap-allocated vectorlike object that wraps the GMP library.

All heap types that do not fit the primary tag scheme (vectors, hash tables, buffers, windows, frames, compiled functions) use a Lisp_Vectorlike representation. The object header contains a pvec_type field that identifies the specific subtype. This is how Emacs represents fundamentally different things like buffer and hash-table through a single tag value while still distinguishing them at runtime.

DEFUN: How C Functions Become Lisp Primitives

The bridge between the C layer and Emacs Lisp is the DEFUN macro. Every Lisp primitive that is implemented in C is declared using this macro, which generates both the C function and a struct Lisp_Subr descriptor that registers the function with the interpreter.

DEFUN ("cons", Fcons, Scons, 2, 2, 0,
       doc: /* Create a new cons cell with CAR and CDR.  */)
  (Lisp_Object car, Lisp_Object cdr)
{
  register Lisp_Object val;
  CONS_MAKE (val, car, cdr);
  return val;
}

The macro takes the Lisp name as a string, the C function name (by convention F prefix), the static struct name (S prefix), the minimum and maximum argument counts, an interactive specification, and a docstring. During startup, each compilation unit calls its syms_of_X() function, which calls defsubr(&Scons) and similar registrations to intern the symbol and attach the subr to it.

Special forms like if and let use UNEVALLED as the max-args value, signaling to the evaluator that it should pass the unevaluated argument list rather than evaluating arguments before calling the C function.

This design means that adding a new Lisp primitive is a matter of writing a C function with the right signature, using DEFUN, and adding a defsubr call to the relevant syms_of_ function. The Lisp programmer sees no difference between a function implemented in C and one implemented in Lisp, except that the C version is generally faster and cannot be redefined at runtime.

The Garbage Collector

Emacs uses a stop-the-world mark-and-sweep collector implemented in alloc.c. It is not sophisticated by modern standards, but it is reliable and well-understood.

Memory is organized into type-specific pools. Cons cells are allocated from block chains; each block holds a fixed number of cons cells. Strings use a two-part scheme: a Lisp_String header is allocated from a block, but the actual character data lives in a separate slab. Symbols get their own block allocator. Vectors and vectorlikes use a dedicated vector heap that is scanned separately.

GC is triggered when consing_since_gc exceeds gc_cons_threshold (defaulting to 800,000 bytes). The mark phase starts from a fixed set of roots: the specpdl (the combined binding and call stack), C-level global variables that hold Lisp objects, and the obarray (the symbol table). It traverses the live object graph and sets mark bits. The sweep phase then walks every pool and reclaims unmarked objects to their free lists.

An incremental, generational collector based on the MPS (Memory Pool System) library has been under development and was merged experimentally in Emacs 30. The default still uses the classic stop-the-world design for now, but the direction is set.

Comparison With Other C-Hosted Runtimes

Lua takes a structurally similar approach but with different choices throughout. Lua uses NaN-boxing on 64-bit platforms: values are stored as IEEE 754 doubles, and the NaN bit patterns that would otherwise be invalid floating-point values are repurposed to store tagged pointers and other types. This is more cache-efficient for numeric-heavy code. Lua’s garbage collector has been incremental since Lua 5.1 and generational since 5.4, putting it well ahead of Emacs in GC sophistication.

Guile, the GNU project’s official embeddable Scheme, is architecturally closer to Emacs Lisp in its C-runtime design but adds proper tail-call optimization, first-class continuations, and a compiler to bytecode. Guile was at one point intended to replace Emacs Lisp as Emacs’s extension language, and there was a multi-year effort to run Emacs on Guile (the guile-emacs project). It stalled primarily because of the sheer volume of Emacs Lisp code that assumed dynamic binding and other Emacs-specific semantics.

CPython uses a more object-oriented dispatch model. Every value is a PyObject* with an embedded ob_type pointer to a type object that contains function pointers for all operations. There are no tag bits; type identity is determined by pointer comparison. This is simpler to extend from C but means every value is a heap pointer, requiring more indirection than Emacs’s direct integer encoding for small values.

The common thread across all of these is that the scripting language is a C library first and a user-facing language second. The host C code defines the object model, the memory management rules, and the primitive operations. The scripting layer builds on top of those foundations.

Why the Architecture Matters

The consequence of this design is that Emacs has no hard boundary between “editor code” and “extension code.” A buffer is a C-level struct buffer, but it is also a first-class Lisp object. The display engine is partly written in C and partly in Lisp, with the boundary determined by performance requirements rather than architectural necessity.

This is what makes Emacs extensible in a way that most editors with scripting layers are not. In editors that expose an API for scripting, the extension author can only do what the API allows. In Emacs, extension code runs in the same runtime as the editor itself and can manipulate the same objects. The tradeoff is that extension code can also break the editor in ways that an API boundary would prevent.

Emacs Lisp lacks tail-call optimization, continuations, and hygienic macros. It used dynamic binding by default for most of its history (lexical binding became opt-in in Emacs 24 via ;;; -*- lexical-binding: t -*-). These are genuine limitations compared to modern Lisps. But the native compiler added in Emacs 28, which uses libgccjit to compile Lisp to native machine code, has narrowed the performance gap considerably for compute-heavy Lisp code.

The architecture is old and carries the decisions of the 1970s MacLisp tradition. It is also coherent in a way that more modern editor architectures often are not, because everything in the system, from syntax highlighting to the minibuffer to the GC threshold, is reachable from the same Lisp environment. Understanding temacs and DEFUN and Lisp_Object is not just trivia about Emacs internals; it is the explanation for why Emacs behaves the way it does at every level.