· 6 min read ·

The Rust AI Training Problem Is More Than a Volume Problem

Source: lobsters

The conversation about AI and Rust typically begins with volume. Rust has less training data than Python or JavaScript, so models are less capable at it. The volume framing is correct but incomplete. The more precise problem is what fraction of the training corpus teaches patterns that remain valid today.

Rust has gone through several distinct idiom epochs, and code written in each one looks like Rust but encodes different knowledge. A model trained on a corpus spanning multiple epochs, without careful attention to temporal validity, will learn a mixture of idioms: some outdated, some current, and some that will only be correct after the next major change ships.

Non-Lexical Lifetimes Changed the Rules

Before Rust 2018, the borrow checker used a lexical scoping model: borrows lasted until the end of their enclosing scope, even if the borrow was no longer actively used. Programmers had to structure code around lexical scope boundaries, often introducing extra blocks or intermediate bindings to satisfy the checker for programs that were, in fact, safe.

// Pre-NLL pattern: explicit scope to drop borrow before mutation
let val = {
    let borrowed = data.borrow();
    borrowed.value
}; // borrowed drops here, safe to mutate
data.mutate();

Non-lexical lifetimes (NLL), which shipped with the 2018 edition in Rust 1.31 (December 2018), let the checker reason about actual use spans rather than lexical scopes. Many of these structural workarounds became unnecessary. Code that required artificial reshaping could be written more directly.

The training data problem is that pre-NLL Rust compiles fine under modern rustc. The workaround patterns still work; they just no longer reflect how the language is written. A model trained on a corpus that includes substantial pre-NLL code learns both the workaround patterns and the post-NLL patterns without a clear signal about which applies when. From the model’s perspective, both are valid Rust. One is just older.

The Async Transition Produced Multiple Incompatible Eras

Rust’s async system has a sharper version of this problem. The async/await syntax stabilized in Rust 1.39 (November 2019), but the ecosystem took years to converge on stable idioms.

Tokio, the dominant async runtime, released 1.0 in February 2021. The 0.x series had a substantially different API. The changes touched fundamental patterns: how tasks are spawned, how I/O is structured, how runtime handles are accessed.

// Tokio 0.x pattern (does not compile against Tokio 1.x)
let mut runtime = tokio::runtime::Runtime::new().unwrap();
runtime.block_on(async { work().await });

// Tokio 1.x
#[tokio::main]
async fn main() {
    work().await;
}

Code written for Tokio 0.3 may compile correctly against its target version, look like valid async Rust to a reviewer, and generate plausible-looking outputs when used as training data, while teaching patterns that fail against Tokio 1.x. A model that learned from a mixed corpus will produce code from either era depending on which patterns the surrounding context most closely resembles.

async-std, which competed with Tokio for a period with different executor semantics, adds another layer. Its patterns are not wrong in isolation, but they produce incorrect behavior when combined with Tokio’s task model, and the failure is often a runtime behavioral difference rather than a compile error. The model has no mechanism for detecting which runtime is in scope unless that information is explicit in context.

Error Handling Went Through Several Phases

Error handling idioms in Rust have shifted substantially over the language’s lifetime.

Before the ? operator stabilized in Rust 1.13 (November 2016), error propagation relied on try!, a macro with the same semantics but different syntax. After ?, the ecosystem went through a period of competing error type libraries. The failure crate had significant adoption between 2017 and 2019 before it was deprecated in favor of anyhow and thiserror. These libraries are not equivalent: error types from failure do not implement the same traits as those from thiserror, and the APIs for attaching context and converting between error types differ in ways that matter at call sites.

A model trained on Rust code from 2018 learns failure. A model trained on code from 2023 learns anyhow and thiserror. A model trained on an unfiltered mix generates whichever pattern the surrounding context most closely resembles, without a reliable basis for determining what the actual codebase depends on.

Polonius Will Create the Next Transition

The next major shift is already in progress. Polonius, the next-generation borrow checker built on Datalog-based fact inference, accepts a broader class of valid programs than the current region-based system. It handles conditional borrows across control flow branches that the current checker rejects, eliminating a category of code that experienced Rust programmers have learned to restructure or work around.

Niko Matsakis’s summary of the Rust project’s AI perspectives touches on this directly: some current compile errors on LLM-generated output are false positives, code that a more precise checker would accept. After Polonius ships, patterns that models have learned as necessary workarounds will become unnecessary in the same way NLL-era workarounds became unnecessary in 2018. Matsakis is the primary architect of Polonius, which makes his authorship of the AI survey more than incidental: the person mapping the community’s AI disagreement is the same person building the next version of the system those disagreements are about.

The workarounds will still compile after Polonius ships and will still appear in any training data generated before the transition completes. Models will continue generating them because that is what they were trained on. The patterns are not wrong; they are just more complicated than necessary, with no signal available to the model that the landscape changed.

NLL went through this cycle once, and Polonius will repeat it.

Why This Diverges From the Python Situation

Python had a significant transition of its own: Python 2 to Python 3. But that transition is largely complete, and filtering Python 2 code from a training corpus is tractable. Python 3 idioms have been stable for years; what was idiomatic in Python 3.6 remains idiomatic in Python 3.12 for the vast majority of patterns a model would generate.

Rust’s evolution is ongoing and relatively fast. The edition system (2015, 2018, 2021, 2024) handles syntax migration in a controlled way, but idiom evolution is not edition-tagged. Library transitions, borrow checker improvements, and new language features create semantic epochs that are difficult to identify from the code itself, precisely because old patterns continue to compile. The language’s strong backward compatibility guarantees, which are a genuine feature, also mean there is no natural forcing function that removes outdated idioms from the corpus.

Go has the Go 1 compatibility promise, which guarantees that code written for Go 1.0 compiles against all future Go 1.x releases. This creates a corpus where the idiom distance between old and new code is smaller, and old code is less likely to teach patterns that conflict with current practice.

What the Volume Framing Gets Wrong

The generate-compile-correct loop is a genuine advantage for AI-assisted Rust development. rustc error messages are structured and precise, identifying specific rules violated and often suggesting fixes. rust-analyzer provides a rich semantic model that AI tools with LSP integration can query in real time. These properties make the feedback loop tighter than in languages where errors are runtime events, and they partially offset training data quality problems.

But the volume framing understates the difficulty. The relevant question is not only how much Rust code exists in a training corpus; it is how much of that corpus teaches idioms that remain correct against the Rust version and ecosystem the generated code will actually run against, and how much teaches patterns from an earlier epoch that still compiles but no longer represents how the language is written.

For AI tooling to improve materially on Rust, better-tagged training data matters as much as more training data. Epoch-aware filtering, or tighter integration with semantic tooling that can answer questions about current idioms rather than relying on historical patterns, addresses the actual shape of the problem. Adding more Rust code to a corpus that contains pre-NLL workarounds, deprecated libraries, and Tokio 0.x patterns does not improve the model’s knowledge of current Rust; it just makes the mixture larger.

Was this interesting?