The Language That Spawned AI Research Gets Almost Nothing Back From LLMs

The observation in Dan Haskin’s recent post is both specific and uncomfortable: writing Lisp, whether that means Common Lisp, Scheme, or Clojure, with AI coding assistants is mostly a waste of time. The completions are wrong, the hallucinated function names don’t exist, and the suggestions drift toward idioms from whichever mainstream language the model decided your parentheses remind it of. The frustration is real, and the reason it exists is more interesting than “there’s not much Lisp on GitHub.”

The Training Data Explanation Is Incomplete

The obvious explanation is scarcity. GitHub’s Octoverse reports consistently show Lisp-family languages representing a fraction of a percent of repository code. Python, JavaScript, TypeScript, Java, and C++ dominate, and LLM training corpora mirror that distribution. A model that has ingested billions of Python tokens and tens of millions of Common Lisp tokens will naturally produce better Python. This is true and it matters.

But training data volume is not the complete explanation. Rust has grown significantly in LLM helpfulness despite being relatively young and having far less historical corpus than Perl. Go support in Copilot and similar tools is quite good despite the language being less than twenty years old. What works for those languages is that they have clear, syntactically distinctive idioms that translate well to pattern completion. A closing brace in Go means something specific and constraining about what follows. The same is largely true in Rust.

The deeper issue with Lisp is structural rather than purely statistical.

Homoiconicity Removes the Scaffolding LLMs Rely On

Listp’s central property is homoiconicity: code and data share the same representation. A list is a list whether it’s data you’re manipulating or a function call you’re evaluating. This makes the language remarkably uniform and extensible, which is why Lisp programmers value it. But for a model predicting the next token, that uniformity removes a great deal of signal.

In Python, syntactic context carries strong constraints. An indented block following def followed by a name and parentheses is a function body. The structural scaffolding narrows the space of valid completions substantially. In Common Lisp, almost everything is (something ...). The difference between a macro call, a special form, a function application, and a reader macro is often invisible at the surface level of the text. A model predicting tokens sequentially has less syntactic grip on what the right continuation looks like.

Compare these two fragments:

def process_items(items):
    results = []
    for item in items:

versus

(defun process-items (items)
  (loop for item in items

The Python version, even incomplete, carries strong structural cues. The Lisp version is already inside the loop macro, whose sub-language is an embedded DSL with its own clause syntax entirely separate from the rest of Common Lisp. The model needs to know the loop macro’s clause vocabulary to produce a valid continuation. Every macro a programmer uses is a mini-language the model may or may not have encountered.

The Macro Problem Is Where Assistance Actually Breaks Down

Macros are where LLM assistance fails qualitatively for Lisp, rather than just degrades in quality. In mainstream languages, the language itself is syntactically fixed. New libraries add functions and classes, but they do not add new syntactic constructs. A well-trained model can reason about an unfamiliar library’s API from context clues, argument names, and type signatures.

In Common Lisp or Clojure, a library can introduce macros that define entirely new syntax. When you write:

(defroutes app-routes
  (GET "/" [] handler/home)
  (POST "/submit" req handler/submit))

The defroutes, GET, and POST forms are Compojure macros. The model needs to know not just that the library exists, but the specific expansion rules of each macro to suggest valid completions inside them. Hallucinated macro forms fail in ways that surface only at macroexpansion time, often with errors pointing deep into the macro’s generated code with no useful reference back to what was typed.

This is qualitatively different from hallucinating a method name in Python. A wrong method name produces a clear AttributeError. A subtly wrong macro expansion can produce code that appears to parse and even load, behaves incorrectly at runtime, or fails with an error that gives no useful hint about the original mistake. The feedback loop that lets programmers quickly recognize and correct AI errors in Python is much weaker in Lisp.

Benchmarks Quantify the Gap

The MultiPL-E benchmark, published in a 2022 paper by Cassano et al., evaluated neural code generation models across more than eighteen programming languages by translating HumanEval problems into each language. The results showed substantial performance degradation in lower-resource languages, with Lisp-family languages and functional languages like Racket scoring well below Python and JavaScript baselines. This held even for models fine-tuned on multilingual corpora.

The degradation was not uniform. Languages with clear syntactic structure and large corpora, including TypeScript, Java, and Go, remained close to Python baselines. Languages with sparse training data and high syntactic flexibility, including Racket, Lua, and Common Lisp, fell significantly. The benchmark makes concrete what most Lisp programmers already felt anecdotally: the problem is real and measurable, not just a matter of personal frustration.

A Feedback Loop Running in the Wrong Direction

The part of this story that matters most going forward is the compounding dynamic. LLM tooling quality is now a genuine factor in developer productivity and language adoption. Developers exploring new languages are making choices partly based on how well their AI assistant handles the language. Copilot, Cursor, and similar tools are most useful precisely when someone is learning and uncertain, which is also when they are most likely to be influenced by the quality of assistance.

If Lisp already has marginal adoption, and AI tooling is noticeably worse for Lisp than for Python or TypeScript, the gap in developer experience widens. Fewer new Lisp projects means less new Lisp code appearing in public repositories. Less training data in future model updates. Worse future completions. The reinforcing loop runs in the wrong direction for any language outside the mainstream.

This is not hypothetical. Language ecosystems have died from less pressure than this. The languages that are already popular attract better tooling, attract more users, generate more training data, and get better tooling again. Languages that were already niche enter a slow spiral that nobody is explicitly trying to create but that emerges from individually rational decisions.

The Historical Dimension

John McCarthy developed LISP in 1958 to 1960 partly as a formal notation for recursive functions and symbolic AI research. The original 1960 paper describes a system for symbolic computation, and for decades the AI research communities at MIT, Stanford, and CMU wrote their programs in Lisp. The connection between Lisp and artificial intelligence is concrete and well-documented, not incidental.

The fact that modern neural AI tools are particularly unhelpful for the language that hosted early AI research is not lost on the Lisp community. It also points to how fundamentally different “AI” means now compared to McCarthy’s era. Contemporary LLMs are statistical pattern-completion engines trained on distributional data; McCarthy’s work was about symbolic reasoning and formal manipulation of expressions. Lisp was designed for the latter. It is poorly served by the former.

What Helps, Practically

There is not much a single developer can do about training data distribution. But a few approaches are worth considering.

Fine-tuned models on specific Lisp dialects are technically feasible and have been attempted within the Clojure community. Results are mixed because the problem extends beyond data volume to the fundamental macro expansion issue described above. A model that has seen more Clojure still hallucinates macro forms from libraries it was not trained on.

The more tractable improvement is tooling that gives the model better context before asking for completions. Feeding a language server’s output, macro expansions, and REPL state into the prompt can help significantly. This requires more setup than just opening a file, but it at least shifts the problem from “the model knows nothing about this” toward “the model has the relevant context for this specific task.”

The frustration in Dan’s post reflects a real cost. Writing Lisp has genuine pleasures that have nothing to do with AI tooling, and the gap between what AI assistance feels like in Python versus Common Lisp is now large enough to register as a meaningful productivity difference. Whether that gap shifts developer behavior in ways that further marginalizes Lisp is the question that nobody building these tools is particularly motivated to answer.