How GNU Emacs Builds a Lisp Runtime in C: DEFUN, Lisp_Object, and the Bootstrap
Source: lobsters
The common description of Emacs is that it is “infinitely extensible” or “programmable.” That framing undersells the architecture. GNU Emacs is a Lisp interpreter implemented in C, and the text editor is the application running on top of that runtime.
A recent post in the “Emacs Internal” series opens with this exact observation and begins tracing the C source to support it. The claim is correct and worth pursuing further. Once you accept that Emacs is a Lisp runtime, the architecture of the src/ directory stops looking like a monolithic C application and starts looking like something deliberate.
The Structure of the C Code
The GNU Emacs source repository has two main subtrees: src/, which is C, and lisp/, which is Emacs Lisp. The C code is roughly 130,000 lines. The Lisp code is roughly 1.5 million. The architecture is visible in those numbers.
src/ contains what has to be in C: the evaluator (eval.c), the reader (lread.c), the garbage collector (alloc.c), file and process I/O, the display engine, and the interface to the operating system. Everything else, including every editing command, every keybinding system, every major mode, the package manager, the version control integration, lives in lisp/.
The design goal, explicit in early GNU Emacs documentation and in interviews with Richard Stallman, was that the C layer should be as thin as possible. Only what cannot be written in Lisp for performance or system-access reasons belongs in C. Everything else should be Lisp, so users can read and modify it without recompiling.
Lisp_Object: The Universal Type
The central data structure in the C code is Lisp_Object, defined in lisp.h. On a 64-bit system, it is an integer wide enough to hold a pointer. The bottom few bits encode the type tag; the remaining bits carry either a pointer to a heap-allocated structure or an immediate value.
/* Simplified from lisp.h */
typedef intptr_t Lisp_Object;
enum Lisp_Type {
Lisp_Symbol = 0,
Lisp_Cons = 1,
Lisp_String = 2,
Lisp_Vectorlike = 3, /* vectors, buffers, windows, frames, ... */
Lisp_Float = 4,
Lisp_Int = 5,
};
/* Extract the type tag */
#define XTYPE(a) ((enum Lisp_Type) ((EMACS_UINT)(a) & 0x7))
/* Extract a fixnum value (stored inline, no allocation) */
#define XINT(a) ((EMACS_INT)(a) >> 3)
/* Extract a pointer to a cons cell */
#define XCONS(a) ((struct Lisp_Cons *) ((EMACS_UINT)(a) & ~0x7))
Small integers (fixnums) are stored directly in the Lisp_Object value with the integer shifted left and the tag bits occupying the low positions. No heap allocation, no pointer indirection. This is the same technique used by V8, LuaJIT, and most modern dynamic language runtimes under the names “nan-boxing” or “tagged pointers.”
Every value flowing through the Lisp evaluator, whether from user code or produced by a C function, is a Lisp_Object. The C code and the Lisp code share exactly one value type. That single shared type is what makes the boundary between them permeable in both directions.
DEFUN: Bridging C and Lisp
The macro that connects the C runtime to the Lisp world is DEFUN. Every built-in Lisp function is defined using this macro. The definition of car:
DEFUN ("car", Fcar, Scar, 1, 1, 0,
doc: /* Return the car of LIST. If arg is nil, return nil. */)
(Lisp_Object list)
{
return CAR (list);
}
The arguments to DEFUN are: the Lisp-visible name as a string, the C function name (prefixed by convention with F), a static struct name holding metadata (prefixed with S), minimum argument count, maximum argument count, an interactive spec, and a docstring.
The macro expands to roughly:
/* The C function */
Lisp_Object Fcar (Lisp_Object list) { return CAR (list); }
/* A static struct describing the function to the Lisp system */
static struct Lisp_Subr Scar = {
.header = { .size = PVEC_SUBR },
.function = { .a1 = Fcar },
.min_args = 1,
.max_args = 1,
.symbol_name = "car",
.doc = "Return the car of LIST...",
};
During startup, syms_of_data() and similar initialization functions call defsubr(&Scar), which registers the Lisp_Subr struct in the obarray, Emacs’s global symbol table. After that, the symbol car in any Lisp expression resolves to the C function Fcar.
The F/S prefix convention runs consistently across src/*.c. Feval is the C function that evaluates a form; Ffuncall calls a function with already-evaluated arguments; Fcons allocates a cons cell; Fmessage writes to the *Messages* buffer. Once you know the convention, reading the C source becomes considerably less opaque.
Special forms like if, let, cond, and quote use max_args = UNEVALLED, which signals to the evaluator that arguments should not be evaluated before the function receives them. This is how (if condition then else) avoids evaluating both branches, and it is implemented using the same DEFUN mechanism as ordinary functions.
The Evaluator
eval.c contains the core of the interpreter. Feval takes a Lisp form as a Lisp_Object and returns its value:
Lisp_Object
Feval (Lisp_Object form, Lisp_Object lexical)
{
if (!CONSP (form))
{
if (SYMBOLP (form))
return Fsymbol_value (form); /* variable lookup */
return form; /* self-evaluating */
}
Lisp_Object fun = XCAR (form);
Lisp_Object args = XCDR (form);
fun = Findirect_function (fun, Qt);
if (SUBRP (fun))
return call_subr (fun, args, lexical);
else if (COMPILEDP (fun))
return exec_byte_code (fun, args);
else
return apply_lambda (fun, args, lexical);
}
The dispatch is on the type of the function object: a Lisp_Subr (a C function defined via DEFUN), a compiled byte-code vector, or an interpreted lambda represented as a cons cell. The evaluator is small enough to read in an afternoon; the complexity lives in the individual cases and in lread.c, which parses textual Lisp source into Lisp_Object trees before evaluation ever begins.
The Bootstrap
Emacs starts in two stages. temacs is the raw C binary: just the runtime, with no Lisp loaded. It can evaluate Lisp expressions but knows nothing about editing. Building temacs is the first step of make.
The second step runs temacs --batch -l loadup.el. The loadup.el file loads the Lisp library in dependency order, from subr.el and keymap.el through simple.el, startup.el, and every mode and utility that ships with Emacs.
After loadup.el finishes, the process holds a complete Lisp heap: thousands of functions, hundreds of keymaps, all the default configuration. Emacs then uses its portable dumper, introduced in Emacs 27 as a replacement for the older unexec mechanism, to serialize the entire Lisp heap to a .pdmp file. When users launch Emacs, it memory-maps the .pdmp file and resumes from the serialized state, bypassing all the loading.
The older unexec approach wrote the entire process image directly into a new executable using format-specific tricks for ELF, Mach-O, and other binary formats. It worked but was fragile and difficult to maintain across platforms. The portable dumper achieves the same startup-time benefit with a structured, relocatable format that does not depend on binary format internals. The high-level result is the same: Emacs launches into a pre-initialized Lisp environment rather than building one from scratch each time.
Comparison with Other Editors
The most instructive comparison is Neovim. Neovim embeds LuaJIT as its scripting engine, exposing an API via the vim.api.* namespace. LuaJIT is dramatically faster than Emacs Lisp for numeric computation, often by a factor of 50 or more for tight loops, because it compiles Lua to native machine code at runtime.
The architectural relationship differs. Neovim’s editing commands are implemented in C; Lua calls them through an API. Emacs’s editing commands are implemented in Lisp; the C code provides what Lisp cannot. Neovim puts the runtime in service of the editor; Emacs puts the editor in service of the runtime.
VS Code is further removed: extensions run in a separate Node.js worker process and communicate with the editor through an IPC protocol. Zed runs extensions as WebAssembly modules for sandboxing. Helix deliberately has no runtime scripting.
Emacs’s choice has real costs. The garbage collector is stop-the-world mark-and-sweep. Byte-compiled Emacs Lisp is substantially slower than LuaJIT or V8 for compute-heavy workloads. The native compilation feature introduced in Emacs 28, using libgccjit to compile Lisp to native machine code via GCC, narrows the gap for pure computation, but the single-threaded runtime remains a structural constraint.
The tight integration provides flexibility the other architectures cannot match. Lisp code can advise any function, including C primitives registered via DEFUN. The function advice-add can intercept calls to find-file, self-insert-command, or any built-in without any special hook support from the C side. The function being wrapped looks like any other Lisp function from the perspective of the caller, because it is.
The Historical Thread
The Emacs design connects to the MIT Lisp machines, the CADR and its successors, where the entire operating environment was implemented in Lisp running on hardware designed for it. Stallman’s explicit goal for GNU Emacs was to recreate that environment inside a conventional Unix process.
The predecessor, the original TECO-based EMACS from 1976, was a set of macros layered on the TECO editor. James Gosling’s 1981 Gosling Emacs added a Lisp-like interpreter called Mocklisp. GNU Emacs replaced Mocklisp with a genuine Lisp interpreter, drawing on the MacLisp dialect from MIT, and the language has been extended since: lexical scoping in Emacs 24, native compilation in Emacs 28, and ongoing work on an incremental garbage collector for Emacs 30.
The C runtime has been modernized too. The unexec dumper is gone, replaced by the portable dumper. The error-prone GCPRO macro system for manually registering C locals with the garbage collector was eliminated in Emacs 27, replaced by conservative stack scanning. The core architecture from 1985, though, remains: a thin C kernel running a Lisp environment in which the editor is written.
Reading the Emacs Lisp Reference Manual with this framing changes what it looks like. It is a language reference for the language in which the editor is written, with primitives for buffer manipulation included because the application happens to need them.