Lisp's AI Problem Is a Corpus Problem, and It Compounds

The article Dan Haskin wrote touches something real. If you have tried writing Common Lisp or Clojure with GitHub Copilot or Claude at your side, you have probably noticed the drop in quality. The completions get vaguer. The suggestions drift toward Python-shaped patterns dressed up in parentheses. The model knows you are writing a function, but it does not know how a Lisp programmer would write that function.

This is not a mystery. It is a corpus problem, and it compounds in ways that make Lisp’s situation worse than most niche languages.

How Much Lisp Exists in Training Data

The models powering most AI coding assistants are trained on large scrapes of public code. BigCode’s The Stack, one of the most widely used datasets for code model training, contains roughly 6.4 terabytes of deduplicated source code across 358 programming languages. The distribution is wildly uneven.

Python, JavaScript, and Java dominate, together accounting for something close to half of all tokens in datasets like this. Common Lisp is typically listed in the dataset but represents a fraction of a percent. Clojure fares better thanks to its JVM roots and the Leiningen and deps.edn ecosystem generating activity on GitHub, but it still sits far below the threshold where models develop reliable intuition.

The issue is not just quantity. It is quality. A lot of Lisp code that exists online is either tutorial-level material that does not reflect real production patterns, or highly specialized code in domains like mathematical computation or academic research that does not generalize well. Models trained on this data develop surface-level familiarity without the deep pattern recognition they have for Python or TypeScript.

The Macro Problem Is Uniquely Hard

Every Lisp dialect gives you macros. This is one of the great things about Lisp: you can extend the language itself, create domain-specific constructs, and write code that writes code. But from a training perspective, macros make Lisp unusually opaque.

In Python, when you see a with statement, the model knows what is happening because it has seen thousands of examples and the semantics are fixed by the language specification. In Common Lisp, when you encounter something like:

(with-connection (conn db-spec)
  (query conn "SELECT * FROM users"))

That with-connection might be a macro from any of several database libraries, or it might be something the current codebase defined locally five files away. The macro expands at compile time and produces arbitrary Lisp forms. The model cannot know, from syntax alone, what behavior to expect.

Macros also mean that Lisp codebases tend to develop their own internal languages. A mature Common Lisp project might have a dozen macros that shape how every other piece of code in the project is written. Training data from that project’s GitHub history gives the model isolated examples without the macro definitions necessary to understand them. The code looks like Lisp but behaves like a custom language.

This is fundamentally different from the challenge AI has with a niche Rust crate. Rust’s semantics are fixed. The unfamiliar parts are library APIs, not syntax extensions that rewrite the language beneath you.

Dialect Fragmentation Makes Everything Worse

Lisp is not one language. Common Lisp, Clojure, Scheme, Racket, Hy, Fennel, Janet, and several others all share the parenthetical tradition and basic functional orientation, but differ substantially in their standard libraries, concurrency models, and idioms.

When an AI model has learned something about Clojure’s persistent data structures and its threading macros:

(-> user
    :profile
    :email
    str/lower-case)

That knowledge does not transfer cleanly to Common Lisp, where you would typically write something like:

(string-downcase (profile-email (user-profile user)))

Or use a threading macro from a library like cl-arrows, which may or may not be in scope. A model trained on mixed Lisp data has to guess which dialect it is in and which libraries are available. It often gets this wrong in subtle ways, producing code that looks plausible but fails to compile.

This fragmentation problem does not affect mainstream languages the same way. JavaScript has its own fractures across module systems and framework idioms, but the underlying semantics are consistent and the model has seen enough examples to develop heuristics for each context. With Lisp dialects, the fractures run deeper, all the way down to how the language evaluates forms.

The Feedback Loop That Keeps the Gap Growing

Training data volume is a function of developer activity, which is itself a function of tooling quality. This creates a feedback loop that Lisp has been losing for several years.

Developers choose languages partly based on tooling. As AI coding assistants became dramatically better for Python and TypeScript, the marginal productivity advantage of those languages increased. New projects that might have been written in Clojure get written in TypeScript instead, partly because the AI assistant is so much more useful in that context. Less Clojure gets written, less shows up in public repositories, and the next generation of models has no more Clojure data than the previous one.

Lisp’s community did not hit its inflection point before AI coding assistants became widely used. Rust grew rapidly from roughly 2020 onward, and that growth created a substantial corpus of production Rust code on GitHub precisely during the period when major AI labs were scraping training data. The Rust Book, the proliferation of blog posts about ownership and lifetimes, the Rustonomicon, the crate documentation ecosystem: all of it ended up in training sets and gave models genuine Rust fluency. Common Lisp’s active projects are often decades old and were already written before modern data collection practices existed.

What Good AI Assistance for Lisp Would Actually Require

Better Lisp assistance from AI models is not impossible, but it requires either more data or different approaches to model training.

One avenue that has shown some promise is tool-augmented generation, where the model has access to a language server or REPL to verify its outputs. Projects like Conjure for Neovim already integrate Clojure and Common Lisp REPLs directly into the editor. An AI system that could submit candidate code to a running REPL and observe the result would catch the class of errors that comes from macro confusion and wrong-library guesses. The major coding assistants do not have this kind of tight feedback loop with Lisp runtimes today.

Retrieval-augmented generation with library documentation is another path. If the model can look up the actual macros defined in a project’s source and understand their expansion behavior before generating code, quality should improve substantially. Some newer agentic coding systems can read arbitrary files, which gets partway there, but they lack the macro-expansion step needed to understand what a Lisp macro call actually produces at compile time.

Fine-tuning on high-quality Lisp code is the most direct fix, but the economics are unlikely to support it. The commercial incentive to improve AI assistance for languages with small developer populations is weak compared to the returns from making JavaScript or Python completions marginally more accurate.

The Sadness Is Legitimate

Haskin’s framing resonates because Lisp is genuinely a productive environment for certain kinds of thinking. The REPL-driven development cycle, where you build programs incrementally and interactively, suits exploratory programming in ways that statically typed batch-compiled languages do not. The macro system lets you eliminate entire categories of boilerplate. Conditions and restarts in Common Lisp give you error handling semantics that most other languages are still trying to approximate through increasingly complex exception hierarchies.

These features compound. A programmer who has internalized them can move fast. AI assistance could dramatically shorten that learning curve. For Python, it does. For Lisp, the model confidently suggests things that do not work, which is worse than no suggestion at all because it erodes trust in the tool and forces you to verify everything the model outputs anyway.

The situation probably will not reverse without deliberate effort. The corpus gap between Lisp and mainstream languages will keep growing, and the incentives do not favor targeted improvement from major AI labs. What might help is if the Lisp community, particularly the Clojure community, invested in creating high-quality annotated datasets suitable for fine-tuning, following the example of what the Rust community did by writing extensively about idioms and patterns in forms that ended up widely distributed and indexed.

Until then, Lisp remains one of the few contexts where the pre-AI approach, reading documentation, studying existing codebases, and understanding what you are writing before you type it, is not just useful but necessary. That is not entirely bad. Working with a language that demands comprehension over autocomplete has its own discipline. But it is, as Haskin says, a little sad.