· 7 min read ·

Emacs Is a Lisp Runtime That Ships With an Editor

Source: lobsters

Most people encounter Emacs as a text editor with an unusually programmable configuration system. The more accurate framing, and the one that this recent deep-dive into Emacs internals opens with, is that Emacs is a Lisp runtime written in C, and the editor is the application that runtime ships with. The distinction is not pedantic. It changes how you reason about performance, about extensibility, and about what it costs to add features.

The C core of Emacs lives in the src/ directory of the repository. As of Emacs 29, that directory contains roughly 200 C source files covering the evaluator, the garbage collector, the type system, buffer management, display, I/O, and the OS interface. The Lisp half of the codebase lives in lisp/, and it dwarfs the C side by line count. The editor you interact with, including the mode line, the minibuffer, dired, org-mode, the package manager, and most keybinding logic, is Elisp running on that C engine.

The Fundamental Type

Everything in the Emacs Lisp runtime is represented as a Lisp_Object. On a 64-bit system, this is a 64-bit integer where the low bits encode the type tag and the remaining bits encode the value or a pointer. The definition in lisp.h has evolved over decades, but the principle is consistent: types are not stored separately from values, they are packed into the value itself.

typedef Lisp_Object EMACS_INT;

The tag bits distinguish integers, conses, symbols, strings, vectors, floats, and a handful of other types. Macros like XCONS, XSYMBOL, and XSTRING strip the tag and reinterpret the remaining bits as a typed pointer:

#define XCONS(a) (eassert (CONSP (a)), XUNTAG (a, Lisp_Cons, struct Lisp_Cons))

This is a classic tagged-pointer representation. SBCL uses something similar, as does virtually every production Lisp implementation. The advantage is that type dispatch during evaluation does not require an extra memory dereference to fetch a type field from a heap object. The disadvantage is that you lose pointer bits for value representation, which is why Emacs fixnums on 64-bit systems only use 62 bits.

Cons Cells and Symbols

A cons cell is two Lisp_Object values packed into a struct:

struct Lisp_Cons {
  union {
    struct {
      Lisp_Object car;
      Lisp_Object cdr;
    } s;
    GCALIGNED_UNION_MEMBER;
  } u;
};

Symbols are more complex. A struct Lisp_Symbol carries a name, a value cell, a function cell, a property list, and a pointer to the next symbol in the same hash bucket of the obarray. The obarray is the global symbol table: when you write (setq my-var 42) or (defun my-fn ...), the symbol my-var or my-fn is interned into the obarray and remains there for the lifetime of the session. This is why mapatoms can enumerate every symbol ever created, and why symbol lookup in Emacs is an O(1) hash table probe rather than an environment chain walk in the common case.

Defining Primitives: the DEFUN Macro

The mechanism by which C functions become callable Lisp functions is the DEFUN macro. Every built-in function in Emacs, car, cons, eval, read, message, all of them, is defined with this macro:

DEFUN ("cons", Fcons, Scons, 2, 2, 0,
       doc: /* Create a new cons, give it CAR and CDR as components, and return it.
... */)
  (Lisp_Object car, Lisp_Object cdr)
{
  register Lisp_Object val;
  CONS_TO_LIVEP (__builtin_frame_address (0));
  val = allocate_cons ();
  XSETCAR (val, car);
  XSETCDR (val, cdr);
  return val;
}

DEFUN expands to both a C function definition (Fcons) and a Lisp_Subr struct (Scons) that records the function’s name, minimum and maximum argument count, docstring offset, and a pointer to the C function. During startup, syms_of_alloc and similar registration functions call defsubr (&Scons) to install each primitive into the Lisp environment. This two-part design means the C function is directly callable from other C code as Fcons(car, cdr), and also callable as a regular Lisp function.

The Evaluator

eval.c contains the core evaluation loop. eval_sub is the workhorse: it dispatches on the type of the form being evaluated. Symbols trigger a variable lookup. Cons cells where the car is a symbol trigger a function call. Self-evaluating forms (integers, strings, vectors) return themselves. The function call path goes through Ffuncall, which handles the different callable types: byte-compiled functions (Lisp_Compiled), interpreted lambdas, C primitives, and macros.

For interpreted lambdas, the evaluator binds the argument symbols in the current environment, evaluates the body forms in sequence, and returns the last value. For byte-compiled functions, it dispatches to the byte-code interpreter in bytecode.c, which maintains its own stack and executes a compact bytecode format. Neither path involves anything like a JIT until Emacs 28.

Bootstrapping: temacs and the Portable Dumper

One of the less-discussed aspects of Emacs startup is that the binary you launch is not a bare C executable that loads Lisp from disk on each run. The build process starts with temacs, which is the C runtime with no Lisp loaded. loadup.el is then evaluated inside temacs, loading and byte-compiling the core Lisp files, building the obarray, and constructing the initial state of the editor. The result is serialized to disk.

For most of Emacs’s history, this serialization used unexec, a Unix trick that called malloc to create a heap snapshot and wrote it as an executable via exec. The approach was deeply system-specific and broke on every new OS version or allocator change.

Emacs 27 replaced unexec with the portable dumper (pdumper), written by Daniel Colascione. The portable dumper walks the live Lisp heap, serializes all reachable objects to a .pdmp file alongside the binary, and on startup maps that file into memory and relocates pointers. This is cleaner, faster to load, and works on any platform that supports mmap. The startup time improvement is measurable: pre-pdumper Emacs spent meaningful time re-evaluating the core Lisp on each launch, whereas pdumper Emacs maps the already-evaluated state directly.

Native Compilation

Emacs 28 added native compilation via libgccjit, a project originally developed as a branch by Andrea Corallo. The native-comp feature compiles Elisp to native machine code by translating the bytecode representation to GCC’s intermediate representation and running the full optimization pipeline against it. The resulting .eln files are cached on disk next to the .elc byte-compiled versions.

The performance gains are function-dependent. Tight numerical loops see the largest speedups, sometimes 5x to 10x over the byte-code interpreter. Functions that spend most of their time in C primitives see less benefit because the overhead was already in C. The Emacs developers measured roughly 2x to 5x improvement across a range of real Emacs Lisp workloads in Corallo’s original benchmarks.

Native compilation does not change the semantics of Elisp. It is still the same language with the same dynamic binding rules, the same garbage collector, and the same obarray. The compiled code calls back into the C runtime for any primitive it cannot inline, which is most of them.

Comparison to Guile

GNU Guile is the other major Lisp runtime in the GNU ecosystem, and it was originally intended to replace Elisp as Emacs’s extension language. That effort stalled for reasons that are partly technical and partly social, but comparing the two runtimes is instructive.

Guile is a Scheme implementation that compiles to a bytecode VM and, since Guile 2.2, also uses a continuation-passing style IR to generate native code. Its type system is also tagged-pointer based, and its C interop story uses similar DEFUN-style macro expansion for exposing C functions to Scheme. The difference is that Guile was designed from the start as an embeddable library, with a clean C API, proper tail-call optimization, and first-class continuations. Elisp has none of those properties because the runtime was not designed for general embedding; it was built to run one application.

The Emacs C runtime reflects this. It has global mutable state throughout, tight coupling between the evaluator and the display engine, and a garbage collector that stops the world and is sensitive to the presence of the GUI event loop. These are not design failures. They are the accumulated weight of an application runtime that was never meant to be extracted and reused.

What This Changes About Reading Elisp

Once you understand that Elisp runs on a C interpreter that was purpose-built for one application, a lot of Emacs Lisp idioms make more sense. The reason let with many bindings is preferred over deeply nested closures is that dynamic binding lookups traverse a binding stack in C, not a lexical environment chain, so flat binding lists are cheaper. The reason large buffers of text are manipulated through buffer primitives rather than Lisp strings is that the buffer gap data structure lives in C and the Lisp interface to it is a thin wrapper. The reason (require 'some-feature) can be slow even for byte-compiled code is that the pdumper serialized the obarray state at build time, and loading new symbols at runtime requires fresh allocation and interning.

This is what the original article is getting at when it frames Emacs as a Lisp runtime first. The editor is downstream of the runtime architecture. Understanding the C layer is not optional if you want to write Elisp that performs well or extend Emacs in ways that go beyond configuration.

Was this interesting?