The Training Data Tax That Lisp Pays Every Time You Ask an AI for Help

Dan Haskin’s recent post Writing Lisp is AI Resistant and I’m Sad puts a name to something Lisp programmers have been quietly experiencing for a couple of years now. Every other developer community is building workflows around Copilot, Claude, and ChatGPT. Lispers open the same tools and get confidently wrong answers, hallucinated APIs, and parentheses that don’t balance. The tools that are supposed to erase the friction of writing code mostly don’t work for them.

This is worth examining carefully, because it’s not quite the problem it appears to be on the surface.

The Raw Numbers

GitHub Copilot was trained on public code from GitHub and other sources. The Stack Overflow Developer Survey 2024 found that Clojure, the most popular Lisp dialect in production use, sits at roughly 1.6% of respondents. Common Lisp, Scheme, and Racket are collectively used by fewer than 1% of developers surveyed. Compare that to Python at 51%, JavaScript at 62%, and TypeScript at 39%.

That ratio cascades directly into training data. The CodeSearchNet dataset, widely used in code model research, includes Python, JavaScript, Ruby, Go, Java, and PHP. Lisp is absent. When OpenAI trained Codex, the model behind early Copilot, they described the training corpus as filtered public GitHub code. Lisp’s share of that corpus is vanishingly small.

The consequence is straightforward: models trained predominantly on Python and JavaScript will generate Python and JavaScript well. For everything else, the quality degrades roughly in proportion to how underrepresented the language is in training data. This is well-documented in academic benchmarks comparing LLM code generation across languages. Low-resource programming languages look a lot like low-resource spoken languages in this respect: the model has seen fewer examples, learned fewer patterns, and has less to draw on when producing output.

Why Lisp Gets Hit Especially Hard

Data scarcity alone doesn’t fully explain the failure mode. Lisp’s particular characteristics make it a worse fit for how transformer models generate code, even setting aside the training corpus size.

Consider what a language model is doing when it completes code: predicting the next token given the prior context. For a typical Python function, the syntactic landmarks are varied, dense with semantic signal, and visually distinct. Function definitions, class bodies, conditionals, loops, and return statements all have different surface forms. The model has strong, distinct patterns to anchor on.

In Lisp, everything is a list. A function call, a macro invocation, a special form, and a data literal all look the same at the outermost level:

(defun square (x) (* x x))
(defmacro when (condition &body body) ...)
(let ((x 10)) (+ x 5))
'(1 2 3)

A model generating Lisp code must track parenthesis nesting depth continuously, distinguish between forms that look syntactically identical but behave differently, and know whether it is in a macro definition context or a runtime context. Transformers handle long-range dependencies through attention, but correctly closing nested parentheses across a 50-token span is a different kind of challenge than matching a Python def with its indented body.

There is also the macro problem. Lisp macros are defined in user code and extend the language syntax. When you write:

(-> value
    (assoc :key val)
    (update :count inc))

The threading macro -> is transforming the code structure before evaluation. A model that has seen limited Clojure code may not have internalized what -> does, where it’s appropriate, or how to compose it with other macros correctly. Unlike Python decorators or TypeScript generics, which have fixed, documented behavior the model can learn from abundant examples, Lisp macros are endlessly variable. Projects define their own. The model cannot generalize well from a small sample.

Homoiconicity compounds this. Because Lisp code is data, idiomatic Lisp is full of code that generates, transforms, or introspects other code at runtime. The patterns involved are genuinely unusual compared to what models have been trained to expect. An AI assistant asked to write a simple domain-specific language embedded in Common Lisp is navigating territory that appears rarely in its training data and involves conventions that are difficult to infer from first principles.

The Dialect Fragmentation Problem

Python is Python. There are style differences and version differences, but a model trained on Python 3 code produces Python 3 code. Lisp is not Lisp: Common Lisp, Scheme, Racket, Clojure, ClojureScript, Fennel, and Janet are all in active use, and they are substantially different from each other in idiom, standard library, and philosophy.

Ask an AI for help with Racket and it may generate Clojure. Ask for help with Common Lisp and it may give you Scheme. The model has seen so little of any individual dialect that it conflates them. The training signal for each is too weak to produce reliable dialect awareness. A Python developer asking Copilot for help with asyncio gets code that works. A Racket developer asking for help with continuations gets something that looks like Lisp and is wrong in three different ways.

What This Means for Language Choice Under AI-Assisted Development

There is a subtler point here than just “AI is bad at Lisp.” The productivity multiplier that AI tools provide is not evenly distributed. A developer working in TypeScript is getting something close to a pair programmer. A developer working in Common Lisp is getting an autocomplete that sometimes hallucinates standard library functions.

Over time, this creates a feedback loop. Teams choosing languages for new projects will weight AI tool quality as a practical consideration. Companies building on developer experience will push toward languages where AI assistance is reliable. The already dominant languages become more dominant, not because they are better, but because they are better supported by AI tools, which is itself a consequence of them being dominant.

This is not unprecedented. Languages with smaller communities have always struggled to attract tooling investment. IDE support, linter quality, package ecosystems: all of these correlate with community size. AI coding assistance is just the latest and currently most conspicuous example of the same dynamic.

What makes it sting for Lisp specifically is the irony. Lisp is the language that pioneered interactive, exploratory, REPL-driven development. It’s the language where code generation is a native concept, where the compiler and the running program share the same representation, where writing code that writes code is idiomatic rather than esoteric. Lisp was doing code-as-data before most of the languages AI tools excel at existed. That its practitioners are now at the back of the line for AI assistance has a particular kind of unfairness to it.

What Would Have to Change

For AI tools to become genuinely useful for Lisp, the training data problem needs a structural solution. More Lisp code on GitHub helps at the margin. Dedicated fine-tuning on Clojure or Common Lisp corpora helps more. There are already efforts in this direction: smaller, specialized models fine-tuned on specific language communities have shown that targeted training can dramatically close the quality gap even when the total data volume remains modest.

But the more realistic near-term path is probably what the Lisp community has always relied on: documentation, examples, and collective knowledge in forms the model can actually use. Projects like the Common Lisp Cookbook and Clojuredocs exist precisely because Lisp knowledge is dense and doesn’t transfer from other languages. Getting that knowledge into model training data, and getting more Lisp code written and published publicly, is the lever the community actually has access to.

In the meantime, Dan’s frustration is legitimate. The gap is real, it matters for day-to-day productivity, and it’s not going away on its own. The most honest thing to say about AI-assisted development right now is that it is not a universal productivity multiplier. It is a productivity multiplier for the languages that already had the most resources, documentation, and community investment. For everyone else, the wait continues.