· 2 min read ·

When 18 Years of YACC Gets Replaced by Recursive Descent, With a Little LLM Help

Source: eli-bendersky

Eli Bendersky just shipped something worth paying attention to: a complete parser rewrite of pycparser, a Python library that sees around 20 million daily PyPI downloads. The old implementation leaned on PLY (Python Lex-Yacc). The new one is a hand-written recursive-descent parser, no PLY dependency at all. And he did it with an LLM coding agent (Codex) doing a significant chunk of the mechanical work.

This is not a small yak shave. pycparser is a full C language parser, meaning it handles declarations, pointer types, function signatures, preprocessor artifacts — all the gnarly parts of C that make parser writers regret their choices. Eli started it in 2008, and the YACC-based approach was completely reasonable then. Grammar-first tooling like YACC was exactly what you reached for when parsing a real language.

Why Recursive Descent Makes Sense Now

The tradeoffs have shifted. YACC-style parsers (and PLY in particular) come with real costs:

  • Dependency weight. PLY is a whole separate library you’re dragging in. For something as foundational as a C parser used by tools like cffi and pycparser-dependent type annotation systems, that matters.
  • Error messages. Generated parsers produce notoriously bad diagnostics. Hand-written recursive descent gives you full control over what you say when something goes wrong.
  • Debuggability. A recursive-descent parser is just functions calling functions. You can step through it, read it, and understand it without knowing anything about shift-reduce automata.

The grammar for C is well-specified (the K&R2 appendix is the canonical reference), so translating it to recursive descent is tedious but not mysterious. That’s exactly the kind of task where an LLM is actually useful.

The Honest Take on LLM Assistance Here

What I find more interesting than the parser choice is the specific role Codex played. Eli isn’t claiming the LLM designed the architecture or made the hard decisions — he drove the technical direction. What the LLM did was handle the mechanical translation work: converting grammar productions into parser functions, generating the boilerplate-heavy portions, keeping the tedium manageable across a large codebase.

That framing matches what I’ve found most useful about LLM coding agents. They’re not great at inventing solutions to hard problems. They’re solid at grinding through work that has a clear structure but would take a human hours of error-prone repetition. Rewriting 18 years of PLY grammar into recursive-descent functions fits that pattern almost perfectly.

There’s also something quietly significant about the fact that a project with 20M daily downloads just shed a core dependency without breaking the world. That’s a testament to pycparser’s test coverage as much as anything else — the LLM could only do its job because there was a solid test suite to validate against.

What This Signals

I’d expect to see more of this pattern: LLM-assisted rewrites of established libraries where the goal is dependency reduction or modernization, not feature addition. The LLM handles the translation layer, the human handles the judgment calls, and robust tests keep everything honest.

If you work with C parsing in Python at any level, the new pycparser is worth pulling in. And if you’re skeptical about LLMs in serious open source work, Eli’s writeup is a grounded account of what that collaboration actually looks like — no hype, just engineering.

Was this interesting?