· 6 min read ·

What the Rust Project's AI Survey Reveals About Language Design and LLMs

Source: lobsters

The Rust project recently published a summary of member perspectives on AI tools, compiled by Niko Matsakis. What makes this document worth reading is not just its findings but its method: a structured attempt by a language design community to understand how its members actually relate to AI tooling, rather than taking a top-down stance.

The short version is that the community is divided, and the reasons for that division map almost perfectly onto Rust’s fundamental design choices. That alignment is worth unpacking.

Why Rust and AI Have a Different Relationship Than Other Languages

Most discussions about AI and programming treat language choice as incidental. An LLM generates Python and it works or it doesn’t. But Rust exposes a specific set of challenges that reveal something fundamental about how LLMs actually reason about code.

The borrow checker is the obvious starting point. Rust’s ownership and borrowing rules require reasoning about the temporal validity of references: who owns a value, when borrows begin and end, and whether two borrows can safely coexist. This is not pattern matching. It is constraint satisfaction over a program’s control flow graph.

LLMs are, at their core, sophisticated next-token predictors trained on vast corpora of text. They are exceptionally good at pattern matching. They have seen enough Rust code to produce syntactically correct structures, reasonable use of iterators, and idiomatic match arms. What they struggle with is precisely the thing the borrow checker enforces: maintaining a consistent mental model of ownership across a non-trivial scope.

The result is a recognizable failure mode. An LLM produces Rust that looks right at every local level but fails to compile because a reference outlives its owner, or because a mutable and immutable borrow overlap, or because a value is moved into a closure that is then called multiple times. The errors are not random. They cluster around the semantic layer that Rust explicitly makes mandatory.

// A common AI failure: moving into a closure called multiple times
let s = String::from("hello");
let greet = || println!("{}", s); // s moved here
greet();
greet(); // error[E0382]: use of moved value: `greet`

This does not mean AI tools are useless for Rust. It means they require a different kind of supervision than you would apply to AI-generated Python or JavaScript.

The Training Data Gap

There is a compounding problem: Rust has significantly less training data than Python, JavaScript, or even C++. The Rust 2023 Annual Survey showed continued growth in adoption, but Rust’s total corpus of public code remains a fraction of what exists for more established languages. Models trained on GitHub data have seen orders of magnitude more Python than Rust.

This matters because Rust idioms are not always derivable from first principles. The patterns for handling errors with ? and custom Error implementations, the conventions around Deref coercions, the subtleties of Pin and async lifetimes: these are things a model learns from exposure, not from logical deduction. Less exposure means more gaps, particularly in the less common corners of the language.

The counterpoint, which experienced Rust developers often raise, is that Rust’s compiler is itself an extraordinarily good feedback signal. When an LLM generates incorrect Rust, rustc tells it exactly what went wrong, with detailed, structured error messages that often include suggestions and links to documentation. This creates a tighter self-correction loop than you get with interpreted languages, where a program can run and produce wrong output without any compile-time indication of the error.

Workflows where an LLM iterates on Rust code against compiler output do work in practice, though they require multiple rounds and careful prompting. The compiler’s errors are specific enough that a loop of generate-compile-correct can converge on valid code for many common tasks.

The unsafe Problem

Where community concern gets sharpest is around unsafe. Rust’s safety guarantees depend entirely on unsafe blocks being used correctly. The language cannot check invariants inside unsafe; that responsibility falls entirely on the programmer.

An LLM generating unsafe Rust that looks plausible but violates aliasing rules, misuses transmute, or constructs an invalid reference produces code that compiles and may appear to work until it causes undefined behavior. This is a category of failure that is much harder to detect than a compile error and far more dangerous in a systems programming context.

// Dangerous: LLMs sometimes generate transmute as a shortcut
// without understanding the layout requirements
let x: u32 = 42;
let y: f32 = unsafe { std::mem::transmute(x) }; // technically fine here

// But the same pattern applied incorrectly:
let v: Vec<u8> = vec![1, 2, 3];
let s: &str = unsafe { std::mem::transmute(v.as_slice()) }; // UB: not valid UTF-8

The argument for treating AI-generated unsafe Rust with skepticism is strong. The argument for using AI tools for safe Rust, where the compiler acts as a verifier, is considerably more defensible. The Rust project’s survey almost certainly reflects some version of this distinction in the spread of member opinions.

The Polonius Factor

One thing that often gets missed in these conversations is that the borrow checker itself is still evolving. The next-generation borrow checker, Polonius, models borrows using Datalog facts rather than the current region-based system. It handles a broader class of valid programs and eliminates some cases where the current checker rejects code that is actually safe.

This matters for the AI discussion because it means the target is moving. Code that an LLM generates today and that fails the current borrow checker might actually be semantically valid under Polonius. Conversely, as the language grows more expressive, the space of valid programs an LLM needs to understand also grows. The surface area is not static.

The non-lexical lifetimes improvement from Rust 2018 already moved the boundary once, and Polonius will move it again. Whether LLM training keeps pace with these shifts depends on how quickly new idioms propagate through public codebases.

What “Helpful” Actually Looks Like

The practical middle ground that experienced Rust developers have landed on is using AI tools for what they are genuinely good at: generating boilerplate impl blocks, suggesting standard library method chains, explaining unfamiliar crate APIs, drafting documentation, and producing skeleton structures for common patterns like builder types or state machines.

For the semantically demanding work, async runtimes, FFI boundaries, custom allocators, Pin combinators, complex lifetime annotations, the consensus seems to be that AI tools are a starting point at best. They can get you to something that resembles the right structure, but closing the gap requires understanding that the tool does not reliably have.

This is not a failure of the tools so much as a reflection of what they are. A system that generates statistically likely next tokens has a different relationship to semantic correctness than a formal type checker. Using them together, the LLM for velocity and the compiler for verification, is a more honest framing than treating either as sufficient on its own.

Why the Survey Matters Beyond Rust

What Matsakis and the Rust project are doing with this survey is modeling a kind of epistemic practice that is rare in tech conversations about AI. Rather than treating AI tooling as either an existential crisis or an unconditional accelerant, they are collecting data from the people closest to the language about how it actually intersects with their work.

The findings are useful precisely because Rust is a community that cares about correctness. A language whose entire value proposition rests on the compiler catching what humans miss has a particularly clear-eyed framework for evaluating what it means for a tool to get something right or wrong. That framework extends naturally to AI tools.

The broader conversation about AI and programming would benefit from more of this kind of language-specific honesty. The experience of using an AI assistant to write Python is genuinely different from using it to write Rust, and collapsing that difference into a general verdict about whether AI is good or bad for programming loses the most interesting information.

Rust’s design choices make it a useful lens for understanding what probabilistic code generation can and cannot do at a mechanical level. The borrow checker is not just a hurdle for LLMs; it is a probe that reveals where pattern-matching breaks down and what kinds of correctness it cannot approximate without something more like formal reasoning. That observation has implications well beyond Rust, and the project is right to take it seriously.

Was this interesting?