Emacs Is a Lisp Machine That Happens to Edit Text

Most editors that support extensibility draw a clear line between the editor and the extension language. VS Code runs TypeScript extensions in a separate process, communicating with the host over RPC through a defined vscode.* API surface. Neovim embeds LuaJIT deeply, but the vim.* API still represents a deliberate boundary between the Lua guest and the C host. The extension language is a guest. In Emacs, there is no such boundary, because there is no guest. The extension language is the runtime.

The C layer in Emacs provides four things: a garbage collector, I/O and display, a Lisp reader and evaluator, and the bootstrapping mechanism. Everything else, including buffers, windows, frames, the minibuffer, the mode line, every editing command, and the package manager, is Emacs Lisp running on top of that C substrate. This is not a design metaphor. It is literally true at the data structure level.

How Lisp Objects Are Represented in C

The foundational type in the Emacs source is Lisp_Object, defined in src/lisp.h:

typedef EMACS_INT Lisp_Object;

enum Lisp_Type {
  Lisp_Symbol     = 0,
  Lisp_String     = 2,
  Lisp_Vectorlike = 3,
  Lisp_Cons       = 4,
  Lisp_Float      = 5,
  Lisp_Int0       = 6,
  Lisp_Int1       = 7,
};

On a 64-bit system, Lisp_Object is a 64-bit integer. The low 3 bits encode the type tag. The remaining 61 bits are the payload, which for pointers is the memory address and for integers is the value directly. Fixnums use two tags (Lisp_Int0 and Lisp_Int1) to recover an extra bit, giving 62-bit integers on 64-bit platforms. Emacs 27 added bignum support via GMP for integers that overflow this range.

The Lisp_Vectorlike tag covers a large family of types: buffers, windows, frames, processes, hash tables, compiled bytecode objects, and more. They share the tag and are distinguished by a secondary pvec_type field in their header. This means a buffer object in your Emacs session is a Lisp_Object with tag 3, pointing to a C struct that the Lisp layer can inspect and modify directly. There is no serialization, no IPC, no API translation layer between the Lisp that calls (buffer-name) and the C struct that holds the buffer’s name.

The cons cell, the basic unit of Lisp list structure, is:

struct Lisp_Cons {
  union {
    struct { Lisp_Object car, cdr; } s;
    struct { struct Lisp_Cons *chain; } gcchain;
  } u;
};

The gcchain union member is the GC’s intrusive linked list threading through allocated cons cells during collection. The garbage collector in src/alloc.c is a stop-the-world mark-and-sweep implementation. Conses are allocated in blocks of roughly 1020 cells. The mark_object() function traverses the live heap; it uses an explicit mark stack rather than recursion to avoid C stack overflow on deep lists. String data lives in separate sdata blocks and is compacted during collection.

The DEFUN Bridge

Every primitive function accessible from Emacs Lisp is registered through the DEFUN macro:

DEFUN ("car", Fcar, Scar, 1, 1, 0,
       doc: /* Return the car of LIST. */)
  (Lisp_Object list)
{
  return CAR(list);
}

This generates two things: a C function named Fcar and a Lisp_Subr metadata struct named Scar that records the function’s name, argument counts, docstring, and pointer to the C implementation. The Lisp name car, the C function Fcar, and the subr struct Scar are all different things that the macro coordinates. Every function you can call from Elisp, whether cons, point, insert, or buffer-name, is a C function registered exactly this way.

Symbols have a redirect field that enables SYMBOL_FORWARDED redirection, allowing a Lisp variable to be backed directly by a C global variable with zero indirection overhead. Variables like most-positive-fixnum map directly to C constants through this mechanism.

Bootstrapping: Three Stages to a Running Editor

The build process for Emacs illustrates how seriously the architecture takes the Lisp-as-substrate premise. It proceeds in three stages.

First, the build compiles temacs, a raw C binary containing only the Lisp interpreter with no Lisp loaded. This binary is nearly useless on its own.

Second, temacs is invoked as temacs --batch --load loadup.el to execute loadup.el, a carefully ordered boot script. It loads subr.el first to define defun, defmacro, when, and unless, because none of these exist yet. Then it loads the byte compiler, then the entire standard library in dependency order.

Third, after loading all the Lisp, the running process serializes its heap to disk. Emacs 27 introduced the portable dumper (pdumper, in src/pdumper.c), which writes the Lisp heap as a relocatable .pdmp file. When a user invokes emacs, the C binary loads, mmaps the .pdmp file into the heap, and begins execution at normal-top-level. The Lisp world that took several seconds to construct during the build is restored in milliseconds at startup.

Before Emacs 27, this dump used unexec, which wrote a raw copy of the process’s memory image using platform-specific hacks. The pdumper is both more portable and more principled about what it serializes.

This bootstrapping sequence matters because it shows the design boundary clearly. The C code is inert without the Lisp, and the Lisp cannot exist without the C substrate. Neither half is optional or ornamental.

Native Compilation and What It Reveals

The native compilation pipeline added in Emacs 28 is the most significant performance development in decades, and it is architecturally revealing.

Before Emacs 28, Emacs Lisp ran in one of two modes: interpreted by the recursive evaluator in src/eval.c, or byte-compiled to a stack-based bytecode executed by src/bytecode.c. Byte compilation provided a meaningful speedup for compute-heavy Lisp, but bytecode.c is still an interpreter.

Native compilation adds a third path. The pipeline is: Lisp source to bytecode, then through comp.el’s LIMPLE IR (Lisp IMPLEmentation language, an SSA-form intermediate representation), then to libgccjit, GCC’s JIT compiler as a C library, which runs the full GCC optimization pipeline and emits native code. The result is loaded via dlopen() as a .eln shared library.

The performance improvement is 2 to 10 times faster than byte-compiled code for compute-heavy functions, and 10 to 50 times faster than interpreted code. The relevant source is split between lisp/emacs-lisp/comp.el for the Lisp frontend and src/comp.c for the libgccjit interface.

What this reveals about the architecture: the optimization boundary in Emacs is between the C primitives and the Lisp layer above them, not between different categories of Lisp code. Native compilation makes the Lisp layer faster, but it does not change the fact that car bottoms out in Fcar, a C function, when called from native-compiled code. The design does not try to blur this boundary or compile away the Lisp runtime. It accelerates Lisp execution while keeping the substrate architecture intact.

Emacs 29 began experimental work on an incremental garbage collector (igc) via the MPS library, which would replace the stop-the-world collector. As of Emacs 30, this is not yet the default. The GC is the one part of the C substrate that has remained structurally unchanged for decades, and incremental collection is architecturally difficult because the entire Lisp heap, including all the objects that represent editor state, must remain consistent across collection pauses.

Why the Architecture Matters

The comparison with VS Code and Neovim is worth being precise about. VS Code’s extension isolation is a deliberate security and stability choice. Extensions that crash do not crash the editor. Extensions cannot corrupt internal state. This is a real benefit, purchased at the cost of making extensions citizens of a different world from the editor they extend.

Emacs made the opposite trade. Because there is no extension API, there is no capability that extensions cannot reach. You can redefine car. You can replace the byte compiler. You can implement a Lisp dialect inside Emacs Lisp and run code in it. Org mode, Magit, and TRAMP are not plugins that talk to Emacs over an interface; they are Lisp programs running in the same runtime as the editor, with full access to every buffer, window, frame, and process object.

The cost is equally real. There is no isolation. Badly written Lisp can corrupt editor state in ways that are difficult to debug. The dynamic binding default, until lexical-binding: t became available in Emacs 24, created entire categories of subtle variable capture bugs. The GC stop-the-world pauses are visible to users with large heaps.

Stallman’s original 1981 design document (MIT AI Memo 519a) articulates the core principle: editor commands should be implemented in the extension language, and the extension language should be real Lisp with a garbage collector. The 1978 Gosling Emacs used Mocklisp, which had no GC and was not real Lisp, and Stallman considered this the fundamental mistake. The GNU Emacs implementation, starting in 1984, was built from scratch specifically to correct it.

Forty years of development has not changed that core premise. The source article by thecloudlet that prompted this exploration frames the same point: understanding Emacs requires understanding its architecture, not just its keybindings. The C layer in Emacs 30 provides garbage collection, I/O, display, and a Lisp evaluator. Everything above that is still Lisp. Native compilation, pdumper, tree-sitter integration, built-in SQLite, and the Android port are all additions that work within this architecture rather than departing from it. The GNU Emacs Internals manual documents the current state of the C substrate in detail for readers who want to continue further into the implementation.

The text editor is the standard library of a Lisp machine. The keybindings are surface.