· 6 min read ·

Two Papers Are All You Need to Write a Compiler

Source: hackernews

A 2008 post by James Hague on his “Programming in the 21st Century” blog resurfaced this week with over 500 points on Hacker News. The thesis is simple: if you want to write a compiler, skip the thousand-page textbooks and read two papers. The post is short, confident, and correct.

The two resources Hague points to are Niklaus Wirth’s slim Compiler Construction book from ETH Zurich and Jack Crenshaw’s Let’s Build a Compiler tutorial series, originally published in Embedded Systems Programming magazine between 1988 and 1995. Neither is a paper in the strict academic sense, but both are short enough to read in a weekend and concrete enough that you finish them with working code rather than a reading list.

The reason these two resources keep getting recommended, 15 and 35 years after they were written, comes down to a single technique: recursive descent parsing.

Recursive Descent: Grammar as Code

Most compiler textbooks lead with formal language theory, pushdown automata, and FIRST/FOLLOW sets. This is the Dragon Book approach, and it is not wrong, it is just the wrong starting point for someone who wants to build something.

Recursive descent sidesteps all of that. The insight is that a context-free grammar maps almost directly onto a set of mutually recursive functions. Each non-terminal in the grammar becomes a procedure. The procedure reads tokens and calls other procedures based on what it sees. No parser generator needed. No table construction. The grammar structure is visible in the code structure.

Here is what this looks like for a simple expression grammar. Given the grammar:

expr   = term { ("+" | "-") term }
term   = factor { ("*" | "/") factor }
factor = number | "(" expr ")"

The parser in any procedural language looks like this:

int expr() {
    int val = term();
    while (tok == '+' || tok == '-') {
        char op = tok;
        advance();
        int right = term();
        val = (op == '+') ? val + right : val - right;
    }
    return val;
}

int term() {
    int val = factor();
    while (tok == '*' || tok == '/') {
        char op = tok;
        advance();
        int right = factor();
        val = (op == '*') ? val * right : val / right;
    }
    return val;
}

int factor() {
    if (tok == '(') {
        advance();
        int val = expr();
        expect(')');
        return val;
    }
    int val = number_value;
    advance();
    return val;
}

This is a complete expression parser with correct operator precedence. The call stack implements the parse tree. Errors are found by checking whether the current token matches what is expected. The whole thing fits in your head.

Wirth’s PL/0 compiler, first presented in his 1975 book Algorithms + Data Structures = Programs, uses exactly this technique to parse, type-check, and generate code for a small Pascal subset in roughly 300 lines of Pascal. The compiler is a direct translation of the PL/0 grammar into recursive procedures, each of which emits stack machine instructions as it descends. Parse, analyze, codegen: all in one pass, all in one function per grammar rule.

One Pass, One Read

The one-pass design is not just an artifact of 1970s memory constraints. It is a teaching tool. When a compiler must generate code in a single left-to-right scan, the design forces clear rules about what information must be available when. Variables must be declared before use. Types must be resolvable at the point of an expression. There is no room for a separate analysis phase that patches things up later.

Wirth’s 1996 Compiler Construction book, the main reference in Hague’s post, extends PL/0 to Oberon-0, a language with procedures, arrays, records, and nested scopes. The compiler targets a hypothetical RISC stack machine called RISC-0. The complete compiler is around 2,500 lines of Oberon and is included in full in the book. Students can read it, modify it, and run it.

The RISC-0 target is a deliberate choice. Targeting a real ISA like x86 or ARM buries the interesting parts of compiler construction under register allocation, calling conventions, and instruction encoding. RISC-0 has simple load/store semantics and a tiny instruction set, so the code generator stays readable. Once you understand what code to emit and when, translating to a real target is a separate, learnable problem.

Crenshaw’s approach in Let’s Build a Compiler is even more direct: he generates 68000 assembly from the start, but keeps the language so simple that the code generator never gets in the way. The tutorial style, with frequent “try this yourself” exercises, makes it the more accessible of the two starting points. Wirth is more precise; Crenshaw is more immediate.

What the Dragon Book Is Actually For

Alfed Aho, Monica Lam, Ravi Sethi, and Jeffrey Ullman’s Compilers: Principles, Techniques, and Tools, universally called the Dragon Book, is comprehensive in a way that Wirth and Crenshaw are not. It covers LL and LR parsing theory, data flow analysis, register allocation, loop transformations, and a dozen kinds of optimization. It is the right reference for writing a production compiler backend, or for understanding what LLVM’s optimization passes are doing.

But the Dragon Book is 1,000 pages because it covers everything, and it covers everything at a level of generality designed for researchers building compiler infrastructure. Someone who wants to understand how a compiler works, who wants to write one and have it run, does not need LR(1) item sets on day one. They need to see parsing, symbol tables, type checking, and code generation in a form small enough to hold in working memory.

This is the gap that Wirth and Crenshaw fill. The Dragon Book answers questions you have after you have written a compiler; the two-paper path answers questions you have before.

The Technique Persisted

Recursive descent is not just a teaching technique. The original Go compiler, gc, was a handwritten recursive descent parser. Clang, the C and C++ frontend for LLVM, uses recursive descent for both the preprocessor and the main parser. Rust’s compiler, rustc, uses recursive descent. GCC, the older codebase, uses it too.

The reason is maintainability. A parser generator produces a state machine from a grammar specification. When the grammar changes, the generated tables change in ways that are hard to inspect or debug. A recursive descent parser changes exactly where you expect it to change: the function for the grammar rule you modified. Error recovery is easier to implement and easier to customize. The code is ordinary code, subject to the same refactoring, testing, and reviewing practices as the rest of the compiler.

Ghuloum’s 2006 paper An Incremental Approach to Compiler Construction applied the same minimal philosophy to a different target: compiling a Scheme subset to x86, one small feature at a time. The paper spawned a small industry of “write a Scheme compiler” tutorials and remains one of the best introductions to targeting real hardware without drowning in it.

Why This Post Still Circulates

The Hacker News thread surfacing Hague’s post in 2026 has the usual mixture of people who read Wirth in school and remember it fondly, people who have been meaning to read it for years, and people arguing about whether LLVM has changed the calculus entirely.

LLVM has changed things, but not in the way that undermines the argument. If you are building a new language today, you probably target LLVM IR rather than a stack machine. But you still need a lexer, a parser, a symbol table, and a type checker. Those components look almost exactly like what Wirth describes. The backend is handed off to LLVM; the frontend is still the problem you have to solve, and recursive descent is still the cleanest way to solve it.

Wirth died in January 2024 at 89. He spent fifty years arguing that software should be small, readable, and fit in a single person’s head. His PL/0 compiler and his Compiler Construction book are the most direct expression of that argument. That a 2008 blog post pointing at his work is still generating 500-point HN threads says something about how well that argument has aged.

Was this interesting?