· 6 min read ·

The Borrow Checker in the Age of Language Models

Source: lobsters

The Rust project recently took stock of where its contributors stand on AI tools. Niko Matsakis, one of Rust’s core designers and the architect of the borrow checker, summarized perspectives gathered from teams across the project in late February 2025. The range of opinions is wide. Some teams are enthusiastic about AI-assisted development; others are skeptical or outright worried. What makes the survey interesting is not the distribution of opinions, but what the disagreements reveal about how Rust’s design principles interact with a world increasingly shaped by language models.

Why Rust Is a Special Case

Most discussions about AI code generation treat languages as roughly interchangeable. You ask the model for a function, it produces something plausible, you test it, and iterate. This model works reasonably well for Python or JavaScript, where type errors are runtime events and a wrong answer often compiles fine. Rust breaks this loop in a specific way.

The borrow checker is not a lint or a style guide. It is a formal system that rejects programs with memory-safety violations before they can run. When an LLM generates Rust code with a lifetime error, the code does not silently misbehave; it does not compile. Empirical benchmarks on code generation tasks consistently show LLMs scoring lower on Rust than on Python or TypeScript, precisely because of this harder verification layer. The model’s first attempt fails to compile more often, requiring more iteration.

Here is a simple but representative example. A programmer new to lifetimes might write:

fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() { x } else { y }
}

This does not compile. The compiler needs a lifetime annotation to know whether the returned reference outlives x or y. The correct version:

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() { x } else { y }
}

LLMs can produce both versions. The trouble is that in more complex cases involving multiple structs, trait objects, and nested borrows, models often generate the wrong lifetimes and then produce increasingly convoluted corrections that do not address the root cause. This reflects something real about the task: borrow checking requires reasoning about aliasing and ownership across an entire program context, not just a local snippet.

The Flip Side: Verification as a Feature

The same strictness that makes Rust harder for AI to write also makes AI-generated Rust easier to evaluate. If a piece of AI-generated Rust code compiles without unsafe blocks, you have a meaningful guarantee that it is free of use-after-free errors, data races, and buffer overflows. This guarantee does not exist for AI-generated C, and it is much weaker for AI-generated Python or JavaScript.

Teams working on tooling within the Rust project are, according to the Matsakis summary, generally more optimistic about AI assistance. That makes sense. For documentation, test generation, or migration tooling where the compiler will validate the output anyway, the cost of AI errors is lower. The compiler catches bad generations automatically, turning AI output into something closer to a first draft than a finished product.

The compiler team itself has more reason to be cautious. The Rust compiler is one of the most complex Rust codebases in existence, with invariants that the type system cannot fully encode. An AI-generated patch to the borrow checker or the MIR lowering passes might type-check and still be subtly wrong in ways that only surface in edge cases. This is the same problem that applies to AI contributions to any sufficiently large codebase, but it is sharper for infrastructure code where the failure mode is corrupted guarantees for every downstream user.

Unsafe Code Is the Real Concern

If the borrow checker is Rust’s main defense mechanism, unsafe is the escape hatch, and it is here that AI assistance becomes genuinely risky. Consider a model generating FFI bindings:

extern "C" {
    fn some_c_function(ptr: *mut u8, len: usize) -> i32;
}

pub fn safe_wrapper(data: &mut [u8]) -> i32 {
    unsafe {
        some_c_function(data.as_mut_ptr(), data.len())
    }
}

This looks reasonable. But correctness depends on whether some_c_function actually satisfies the contract implied by the wrapper: that it will not hold the pointer after the call returns, that len matches the actual allocation, that the function is thread-safe if called concurrently. None of these constraints are visible to the compiler. The LLM generating this code has no way to verify them either. It can only reproduce patterns it has seen.

The Rustonomicon documents the full set of invariants that unsafe code must uphold. The list is long and non-obvious. Models can learn the syntax and common patterns of unsafe Rust, but reasoning about the semantic contracts that FFI code must satisfy requires context that is typically not in the local snippet being generated.

The Rust project has long maintained that unsafe code requires careful human review. The question the AI era raises is whether that norm will hold as AI-generated contributions become more common and harder to distinguish from human-written ones.

Language Design in the AI Era

The more philosophical part of the Matsakis summary concerns whether the existence of AI tools should influence how Rust evolves as a language. This is a live question in language design circles. Some argue that if LLMs struggle with lifetime annotations, the language should find ways to make them less necessary or more inferrable. Work on Polonius, the next-generation borrow checker, is aimed at accepting more valid programs, which incidentally makes things easier for AI-generated code as well.

Others push back on letting AI tool quality drive language design. Rust’s existing complexity exists for reasons. Lifetimes are explicit because they encode real information about program behavior. Making them implicit or optional would not make the underlying aliasing problems disappear; it would hide them.

There is a defensible middle ground here: continue improving Rust’s ergonomics where it is unnecessarily complex, do that work independently of AI considerations, and let AI tools adapt. Models improve. Rust-specific fine-tuning and retrieval-augmented approaches that include compiler error feedback as part of the generation loop already produce better results than vanilla prompting. The rust-analyzer language server provides rich semantic information that AI tools can leverage to generate more accurate code than raw text completion would produce.

What This Means for the Project Itself

The Rust project is in an interesting position. It maintains one of the most respected codebases in systems programming, with contribution standards that prioritize correctness and long-term maintainability. At the same time, many contributors use AI tools daily, and the tooling ecosystem around Rust is increasingly AI-assisted.

The tension is not between being pro-AI or anti-AI. It is between two legitimate concerns: using AI to lower the barrier to contributing and maintaining Rust, versus preserving the review standards that make Rust code trustworthy in the first place. The Matsakis summary does not resolve this tension because it is not resolved. Different teams weight these concerns differently based on their specific contexts.

The most useful outcome of this kind of project-wide survey is not consensus. It is clarity about where the actual disagreements lie. The Rust project teams broadly agree that AI tools are part of the environment their contributors operate in. They disagree about what that means for language design, contribution policy, and tooling investment. Making those disagreements explicit is how you make progress on them.

The Longer Arc

Systems programming has always had a harder relationship with abstraction tools than higher-level ecosystems. Garbage collectors, managed runtimes, and now AI code generation all involve trading some degree of explicit control for productivity. Rust’s historical position is that you can have safety without sacrificing control, but that the explicit nature of ownership and borrowing is load-bearing, not incidental.

AI tools do not change that bet. They change who is doing the explicit reasoning, and they raise the question of what happens when the reasoning is wrong. For a language used in operating systems, browsers, embedded firmware, and cryptographic libraries, the failure modes of incorrect code are severe enough that the careful treatment the Rust project is giving this question is warranted. The borrow checker was designed to catch errors that humans make. The next challenge is figuring out how much of that responsibility it can extend to cover errors that language models make too.

Was this interesting?