· 9 min read ·

Emacs Is a Lisp Runtime That Ships an Editor, Not the Other Way Around

Source: lobsters

Most people who reach for Emacs encounter it as an editor. An unusual one, certainly, with a reputation for steep learning curves and infinite configurability, but an editor. The framing does not survive contact with the source code. This recent Emacs Internals post puts the correct frame on it directly: Emacs is a Lisp runtime written in C, and the thing we call the editor is a program written in that Lisp. The distinction matters because it changes how you think about extending it, debugging it, and comparing it to its contemporaries.

The Two-Layer Architecture

The src/ directory of the Emacs source tree is where the C layer lives. It implements a Lisp interpreter: a reader that parses .el files into Lisp objects, an evaluator that executes them, a mark-and-sweep garbage collector, and a set of primitive functions that Lisp code calls into. Everything in src/ is the runtime. Nothing in src/ knows what a major mode is, what font-lock is, or what M-x does.

The lisp/ directory is where the editor lives. All the editor behavior, every mode, every keybinding system, every completion framework, the package manager, org-mode, dired, eglot, use-package, all of it, is Lisp code running on top of the C runtime. The ratio reflects this: the Emacs distribution contains roughly 250,000 lines of C and over 1.5 million lines of Lisp.

The practical consequence is that there is no meaningful boundary between “the editor” and “your configuration.” When you write Emacs Lisp, you are writing in the same language and with the same access to primitives that the editor itself was written in. This is not true of any other widely used editor.

How C Primitives Become Lisp Functions

The mechanism connecting the two layers is the DEFUN macro, defined in src/lisp.h. Every built-in function visible to Lisp is registered via DEFUN. Here is a simplified version of how car is defined in src/data.c:

DEFUN ("car", Fcar, Scar, 1, 1, 0,
       doc: /* Return the car of LIST. If LIST is nil, return nil. */)
  (Lisp_Object list)
{
  return CAR (list);
}

DEFUN expands into two things: a C function named Fcar that implements the operation, and a static Lisp_Subr struct named Scar that records the function’s name, arity, docstring, and pointer to Fcar. During initialization, every C source file that defines primitives calls a syms_of_FILENAME function, which in turn calls defsubr(&Scar) to register the subr with the Lisp environment.

By the time any Lisp code runs, roughly 2,000 primitive functions have been registered this way. eval, cons, car, cdr, read, prin1, buffer-substring, set-window-configuration, all of them are C functions wrapped in this mechanism.

Special forms, the ones that do not evaluate all their arguments, use a special arity constant UNEVALLED. The evaluator in src/eval.c checks for this and passes the argument list unevaluated, letting the C implementation decide what to evaluate and when. This is how if, let, quote, cond, progn, and unwind-protect are implemented.

The Object Model

The Lisp object representation in C is worth understanding because it shapes every operation in the runtime. Lisp_Object is defined in src/lisp.h as a machine word, EMACS_INT on most platforms, and uses pointer tagging to encode both type information and value in a single word.

On a 64-bit system, small integers (fixnums) are stored directly in the word with the value arithmetic-shifted left. Since all heap-allocated objects are aligned to at least 4 bytes, the low-order bits of a pointer are always zero, leaving room for a type tag. This means that integer arithmetic on fixnums requires no heap allocation at all, just a shift.

The core struct types live on a garbage-collected heap:

struct Lisp_Cons {
  union {
    struct { Lisp_Object car, cdr; } s;
    struct Lisp_Cons *chain;  /* free list link */
  } u;
};

struct Lisp_Symbol {
  Lisp_Object name;       /* the symbol's name string */
  union {
    Lisp_Object value;    /* plain variable */
    struct Lisp_Blv *blv; /* buffer-local variable */
    union Lisp_Fwd *fwd;  /* C-backed variable */
  } val;
  Lisp_Object function;  /* function cell */
  Lisp_Object plist;     /* property list */
  struct Lisp_Symbol *next; /* hash chain in obarray */
};

The fwd case in the symbol union is how C variables become Lisp-visible. When you call (setq max-lisp-eval-depth 2000) in Emacs Lisp, you are writing to a C variable declared with DEFVAR_INT in src/eval.c. The variable is backed by the C integer max_lisp_eval_depth; the Lisp symbol is a thin wrapper pointing at it.

Emacs 27 added arbitrary-precision integers via GMP. Values that do not fit in a fixnum become heap-allocated Lisp_Bignum structs wrapping mpz_t. The INTEGERP predicate covers both cases transparently.

Bootstrapping: From temacs to a Runnable Editor

The build process produces two executables, and understanding both clarifies how the runtime and the editor layer relate.

temacs is “bare Emacs”: the C runtime compiled and linked, with all primitives registered, but no Lisp loaded. If you run temacs directly it starts an interpreter session with nothing in it. There are no buffers, no keymaps, no commands. It is a Lisp machine with about 2,000 built-in functions and nothing else.

The real emacs binary is produced by running temacs --batch --load loadup.el. The lisp/loadup.el bootstrap script loads the entire lisp/ tree in a carefully ordered sequence, starting with subr.el (which defines fundamental macros like defun, defmacro, when, and unless in Lisp), then working through the rest of the library. Once all the Lisp is loaded, the now-populated Lisp heap is serialized to disk.

This serialization is the dump. Before Emacs 27, it was handled by unexec, a deeply platform-specific mechanism that forked the process and wrote the heap into a modified ELF or Mach-O binary. unexec broke regularly with glibc updates, macOS releases, and any change to virtual memory layout. It was incompatible with ASLR.

Emacs 27 (released August 2020) replaced unexec with pdump, the portable dumper, implemented in src/pdumper.c. The pdump format serializes the Lisp heap to a .pdmp file alongside the binary. All internal pointers are recorded as offsets from a base address. At startup, the .pdmp file is mmap’d into memory and a single relocation pass adjusts all pointers to the actual load address. The result is ASLR-compatible, portable across relocations, and lets the OS page in only the parts of the heap that are actually accessed on a given startup.

The observable effect is that starting Emacs does not re-parse and re-evaluate the 1.5 million lines of Lisp in lisp/. All of it is already materialized in the .pdmp file, ready to be mapped.

Native Compilation

For decades, Emacs Lisp code ran either interpreted (raw AST evaluation) or byte-compiled to a stack-based bytecode format. Neither option was fast by modern standards. Byte compilation helped with startup time and reduced allocation pressure but did not produce competitive runtime performance for compute-intensive Lisp.

Emacs 28.1 (April 2022) shipped native compilation, the work of Andrea Corallo, originally developed as the “gccemacs” branch. The compiler pipeline looks like this:

foo.el  →  foo.elc (bytecompile)
        →  LIMPLE IR (SSA form, computed in comp.el)
        →  GCC GIMPLE (via libgccjit)
        →  foo.eln (native shared library, ELF/Mach-O)

LIMPLE is a Lisp-level intermediate representation in static single assignment form, defined in lisp/emacs-lisp/comp.el. The compiler performs type inference and several optimization passes on LIMPLE before handing it off to libgccjit, which is GCC’s JIT compilation library exposed as a C API. GCC’s full optimization pipeline runs on the resulting GIMPLE, and the backend emits native machine code for x86-64, AArch64, or whatever the target platform is.

The .eln files are loaded as shared libraries via dlopen. They interoperate with the Lisp runtime through an ABI defined in src/comp.h. The performance improvement for compute-bound Lisp is significant: roughly 2x to 4x over byte-compiled code in typical workloads, and up to 40x in tight arithmetic loops. Operations that still incur full interpreter overhead are those involving dynamic variable access and calls through unknown function values, since those require runtime dispatch the compiler cannot eliminate.

The dependency on libgccjit is worth noting for deployment. On Debian and Ubuntu, the package is libgccjit0. Without it, Emacs falls back gracefully to byte-code interpretation; native compilation is a build-time and runtime option, not a requirement.

By default in Emacs 28 and 29, native compilation happens asynchronously in a background subprocess as packages are loaded for the first time. The compiled .eln files go into a cache directory, typically ~/.emacs.d/eln-cache/ or the appropriate XDG path.

The Contrast with Neovim and VS Code

Neovim embeds LuaJIT as its scripting engine. The architecture is similar in spirit to Emacs: a C core with a scripting language on top. The meaningful differences are in the interface between them.

In Neovim, Lua calls into the editor through the nvim_* API, a set of functions exposed via msgpack-RPC. The same API is available to external clients over a Unix socket, which means remote plugins (Python, Ruby, any language with an RPC client) use the exact same API surface as embedded Lua. The interface is deliberate and typed. LuaJIT produces JIT-compiled machine code, giving Lua plugins better raw throughput than Emacs’s byte-compiled Lisp, though Emacs 28’s native-comp has narrowed that gap considerably.

VS Code takes a different approach entirely. Extensions run in a separate Node.js process (V8 engine) communicating with the editor core over a typed JSON-RPC protocol. Extensions are isolated: a crashed extension does not crash the editor. The extension API is intentionally narrow, exposing only what the VS Code team chose to surface. You cannot reach into editor internals; you can only call the provided API.

The spectrum these three points define is a real design trade-off. VS Code’s process isolation and narrow API make extensions safer and easier to reason about, at the cost of limiting what extensions can express. Neovim sits in the middle: Lua has direct access to editor state through a deliberate API, but the API is still a defined boundary. Emacs has no such boundary at all. Lisp code can call any primitive directly, inspect any data structure, redefine any function, and replace any part of the editor’s behavior. The same property that makes Emacs infinitely configurable is the one that makes a badly written package capable of breaking unrelated parts of the editor.

This is not an accident or a failure of design. It is the explicit philosophy: the editor is a Lisp program, and you are a Lisp programmer working in the same environment it was built in. The GNU Emacs manual appendix on internals describes this plainly. The C layer exists to provide performance for operations that need it and to interface with the OS. Everything else is Lisp, including the parts you would call the editor.

What This Means in Practice

The architectural insight has practical consequences for anyone spending serious time with Emacs.

Debugging misbehaving packages means reading Lisp, not navigating a plugin API. When M-x describe-function tells you that find-file is defined in files.el, you can open that file, read the implementation, and understand exactly what it does, because the function was written in the same language you use. This is not true of a VS Code built-in command.

Extending Emacs at the level the built-in modes work at requires no special privilege. If you want to write a major mode that manipulates buffers the same way c-mode does, you call the same C primitives c-mode calls. There is no distinction between “core API” and “extension API.”

The native compilation work in Emacs 28 makes the performance story materially better for compute-intensive packages. Tree-sitter integration in Emacs 29 (via treesit.el) added C-level parsing performance for syntax-aware operations without requiring that behavior to be implemented in Lisp. The runtime continues to evolve without abandoning the architectural bet Stallman made when he built the C core and wrote the editor in Lisp on top of it.

The Emacs Internals series worth reading as an entry point into the C source. The payoff from understanding the two-layer architecture is that the source stops looking like an opaque black box and starts looking like a Lisp program with a C substrate, which is exactly what it is.

Was this interesting?