Most people who use Emacs think of it as an editor with a powerful scripting language. That framing gets the architecture backwards. The C code at the bottom of Emacs is not an editor with hooks for customization. It is a Lisp runtime, and the text editor, the file browser, the email client, and every other piece of behavior you associate with Emacs is a Lisp program running on that runtime. This recent writeup exploring Emacs internals makes the point explicit, and it is worth following that thread much further.
Everything Is a Lisp_Object
The foundational type in the Emacs C source is Lisp_Object. Every value that Lisp code can touch, every argument passed to a function, every return value, every element of a list, is represented as a Lisp_Object. In current Emacs source it is defined essentially as a machine-word-sized integer that encodes both a type tag and a value:
typedef EMACS_INT Lisp_Object;
Emacs uses a tagged pointer scheme. The low bits of the word encode the type. For small integers (fixnums), the value itself is stored directly in the remaining bits, shifted up. For heap-allocated objects like cons cells, strings, vectors, and symbols, the low bits identify the type and the remaining bits form a pointer to the actual struct on the heap.
The type tags include things like Lisp_Int0, Lisp_Int1 (two integer tags to recover one extra bit of range), Lisp_Symbol, Lisp_String, Lisp_Vectorlike, Lisp_Cons, and Lisp_Float. The Lisp_Vectorlike tag covers a wide range of compound types, including vectors proper, hash tables, buffers, windows, frames, and processes. All of those are structs that start with a header word encoding their specific subtype.
This layout means that type dispatch in the evaluator is a bitmasking operation, not a virtual dispatch or a pointer dereference. The common case of checking whether something is a fixnum is a single AND instruction against the low-bit mask.
The DEFUN Macro and the Primitive Layer
The C layer exposes primitives to Lisp through a macro called DEFUN. Here is what a typical primitive definition looks like in the Emacs source:
DEFUN ("car", Fcar, Scar, 1, 1, 0,
doc: /* Return the car of LIST.
...*/)
(Lisp_Object list)
{
return CAR (list);
}
The macro takes the Lisp name, the C function name (prefixed with F), a companion symbol struct (prefixed with S), minimum and maximum argument counts, an interactive spec, and a docstring. The macro expansion creates the C function and also a Lisp_Object variable named Scar that holds the Emacs symbol car bound to that function.
During startup, syms_of_fns, syms_of_eval, syms_of_alloc, and similar syms_of_* functions in each source file register all these primitives into the global obarray. This is the bridge between the C world and the Lisp world: a fixed set of C functions become the atoms from which all Lisp behavior is constructed.
The Emacs source tree contains around 200,000 lines of C spread across files like eval.c, alloc.c, lread.c, data.c, fns.c, and buffer.c. The Lisp source under lisp/ is several times larger, approaching a million lines. The ratio is telling. The C layer is kept as thin as practical. Anything that can be written in Lisp is written in Lisp.
The Evaluator in eval.c
The heart of the runtime is Feval in eval.c. It takes a Lisp_Object form and an environment indicator, and returns a Lisp_Object result. The top-level dispatch is a type check:
- If the form is a self-evaluating object (number, string, vector), return it directly.
- If the form is a symbol, look it up in the current environment or global symbol table.
- If the form is a cons cell (a list), the car is treated as the operator and the cdr as the argument list.
For list forms, Feval checks whether the operator is a symbol bound to a special form (like if, let, cond, quote), a macro, or a function. Special forms and macros are handled with dedicated C code. Function calls go through Ffuncall, which handles both Lisp-defined functions (lambdas represented as lists) and compiled bytecode functions.
The evaluator is deliberately not recursive in the C call-stack sense for tail calls: Emacs implements a trampoline via a GCPRO-style stack discipline and setjmp/longjmp for non-local exits (catch/throw, error signals). This is one of the more intricate parts of the C code, and it explains why Emacs error handling has historically looked nothing like standard C exception handling.
Bootstrap, temacs, and the Dump
Emacs has a fascinating cold-start problem. The Lisp runtime needs to be initialized before it can evaluate Lisp, but most of the runtime’s behavior is defined in Lisp. The solution is a two-phase build.
First, a stripped executable called temacs is built. This is the bare C runtime with no Lisp loaded, capable of reading and evaluating Lisp files but with only the primitive C functions available. Running temacs with the --batch flag and a bootstrap script causes it to load the core Lisp files in order: first the very basics like subr.el, then progressively more of the standard library.
Once all the Lisp is loaded, the process is “dumped” to a file. The traditional mechanism, unexec, literally copied the running process’s heap and data segments into an executable, so starting Emacs was essentially restoring a previously initialized process. This approach was notoriously fragile across operating systems and toolchain changes.
Modern Emacs (since version 27) uses pdump, the portable dump format. Rather than copying process memory, pdump serializes the live Lisp heap into a structured binary file. On startup, this file is mmap’d and the heap is reconstructed. The serialization has to handle all the pointer fixups involved in relocating a heap to a different address, but it avoids the deep OS-specific hacks that unexec required.
The dump contains everything that was alive at the end of the temacs bootstrap run: all the loaded Lisp, all the interned symbols, all the compiled bytecode for the standard library. Starting a fresh Emacs process is fast precisely because this work is done once at build time.
Garbage Collection in alloc.c
Emacs uses a mark-and-sweep garbage collector, implemented in alloc.c. The GC root set includes the C stack (scanned conservatively using the GCPRO macros that have been present since early Emacs, though modern Emacs is moving toward a more precise system), all global Lisp_Object variables, and the live buffer list.
Marking traverses the object graph from roots, setting the mark bit in each object’s header. Sweeping then walks the allocator’s free lists and recycles unmarked objects. Emacs allocates cons cells, strings, and vectors from separate pools to reduce fragmentation and improve cache locality during sweeps.
The GC is stop-the-world, which can cause noticeable pauses in large Emacs sessions. This has been a known limitation for decades. Proposals for incremental and generational collection have circulated in the Emacs development community for years. The elisp-manual section on garbage collection exposes gc-cons-threshold and related variables that give users some control over when collection fires.
Native Compilation: The Runtime Grows Up
A significant architectural addition landed in Emacs 28: native compilation of Emacs Lisp via libgccjit. The native-comp feature, developed primarily by Andrea Corallo, compiles .elc bytecode files to native machine code. The compiled output, .eln files, are loaded as shared libraries.
The compilation pipeline goes through an intermediate representation called LIMPLE (Lisp Intermediate Language), which models Emacs Lisp’s semantics closely enough to handle dynamic binding, closures, and the full Lisp_Object type system. The generated native code still operates on Lisp_Object values and calls into C runtime functions for operations that cannot be fully inlined, but eliminates bytecode dispatch overhead for tight loops.
Native compilation does not change the fundamental character of the runtime. The GC still manages all heap objects. The C primitive layer is still the substrate. What changes is the cost of executing Lisp: benchmarks on compute-heavy Emacs Lisp code show speedups in the 2x to 40x range depending on the operation, with the largest gains in numeric-heavy and list-processing code.
The Neovim Comparison
Neovim took a different path when it embedded Lua via LuaJIT as its primary extension language. Lua is a purpose-built extension language with a clean C API, a fast JIT compiler, and a small implementation surface. The tradeoff is that Lua is not Neovim’s substrate the way Lisp is Emacs’s substrate. Neovim’s core editor logic lives in C; Lua plugs into event callbacks and APIs.
This means Neovim plugins operate through a defined interface layer, while Emacs Lisp programs can reach into and redefine essentially any behavior by replacing function bindings in the global obarray. You can replace car in Emacs Lisp if you want to. The openness cuts both ways: it makes comprehensive extension possible and makes it easy to shoot yourself in the foot at runtime.
VS Code’s extension model via the Extension API over V8 is further still from the Emacs model. Extensions run in isolated processes and communicate with the editor over a message protocol. The runtime boundary is explicit and enforced. The tradeoff is safety and performance isolation at the cost of the seamless mutability that makes Emacs programming feel like live surgery on a running system.
What the Architecture Means in Practice
Understanding Emacs as a Lisp runtime changes how you read its behavior. When you call M-x and pick a command, you are invoking a Lisp function that was likely defined in one of the .el files loaded during startup, possibly with overrides contributed by one of your installed packages, possibly byte-compiled or natively compiled. The key binding dispatch, the minibuffer, the display engine’s redisplay loop, the undo system: all of these are Lisp programs that the C runtime is executing.
The C-h f (describe-function) command demonstrates this concretely. For almost any interactive command, it will show you the Lisp source, often with a link to the exact .el file. For C primitives, it shows the C source location. The line between the two is navigable and inspectable at runtime.
This transparency is what Stallman was designing for when he chose Lisp in the first place. The goal was not to bolt customization onto an editor. The goal was to ship a Lisp machine that happened to come with a text editor already written for it. Forty years later, that bet is still paying out.