· 8 min read ·

What Rust's AI Policy Debate Is Actually About

Source: lobsters

Niko Matsakis published a summary of Rust project member perspectives on AI tools in late February, and the document has generated the kind of discussion you would expect: arguments about LLM benchmark scores, debates over whether the borrow checker is learnable, opinions on unsafe code review. Most of this commentary is focused on the right surface but the wrong depth.

The interesting thing about the survey is not that Rust contributors disagree about AI tools. It is that the shape of their disagreement maps onto a question that language designers have argued about for decades, and which the Rust project itself has never fully resolved: what is an explicit type system actually for?

There are two answers to that question, and they lead to completely different evaluations of AI-generated code.

Two Theories of What Types Are For

The first theory is defensive. Types catch errors. You annotate your program with type information, the compiler checks it, and the class of bugs that the type system covers cannot occur at runtime. This is the dominant framing in most discussions of typed languages. Proponents of TypeScript over JavaScript make this argument. Advocates of Rust over C make the same argument at a higher confidence level.

On this theory, a piece of code that type-checks is probably correct, in the sense that the type system enforces. An AI-generated Rust function that compiles without unsafe is free of use-after-free errors, data races on shared state, and null pointer dereferences. This is a real and substantial guarantee. If your theory of types is defensive, then Rust’s compiler is an excellent verifier for AI-generated code, and the workflow of generate-compile-iterate is defensible.

The second theory is constructive. Types encode intent. When you write a Rust function signature like:

pub fn process<'a, T>(
    items: &'a mut [T],
    predicate: impl Fn(&T) -> bool,
) -> impl Iterator<Item = &'a T>

you are not just helping the compiler catch bugs. You are writing a precise, machine-verifiable specification of what the function does with memory and time. The lifetime 'a says that the returned references live exactly as long as the original slice. The &'a mut says this function borrows mutably for that duration, which means callers cannot simultaneously hold other references into the same slice. This is not incidental; it is a design decision made explicit. The types are a contract between the author and every future reader, including the compiler.

On this theory, code that type-checks but was generated without understanding that contract is a different kind of problem. The types are right, but no one stated what they meant. The function compiles; the intent it encodes was never formed.

Where the Policy Disagreement Actually Lives

This distinction explains why Rust contributor opinions on AI tools, as the Matsakis summary makes clear, diverge so sharply along team lines. Teams doing documentation, migration tooling, or formatting work are closer to the defensive theory in practice. Their code is validated by the compiler. If it compiles, the relevant guarantees hold. AI assistance that produces something the compiler accepts is genuinely useful, and the error-catching function of the type system applies.

Teams working on the compiler itself, on the standard library, on unsafe abstractions, are living inside the constructive theory every day. The invariants they maintain are not fully expressible in the type system. A Vec<T> maintains the invariant that its length is always less than or equal to its capacity, that the allocated memory is valid, that the pointer is non-null when capacity is positive. None of these are encoded in Vec’s type signature. They are upheld by human understanding and enforcement in unsafe blocks whose correctness the compiler cannot verify.

An LLM generating a contribution to Vec’s implementation that compiles but subtly violates one of these invariants is exactly the failure mode that terrifies experienced systems programmers. The type-checking gave a green light; the program is broken. For a structure used by every Rust program that ever allocates, the blast radius is significant.

The standard library team’s contribution guidelines spend more time on invariant documentation and unsafe justification than on most other concerns. That emphasis reflects the constructive theory: the types tell you the interface; the documentation tells you the intent; and both are required for the code to be trustworthy.

What the Borrow Checker Reveals About AI Reasoning

The borrow checker’s failure modes with AI-generated code are informative precisely because of what they expose. When an LLM generates Rust that fails with error[E0502]: cannot borrow as mutable because it is also borrowed as immutable, the compiler is not catching a typo. It is catching a failure to track ownership state across a program’s control flow.

Consider a representative failure pattern:

fn update_and_read(data: &mut HashMap<String, Vec<i32>>, key: &str) {
    let values = data.get(key);          // immutable borrow of data
    if values.is_none() {
        data.insert(key.to_string(), vec![0]); // error: mutable borrow while
    }                                          // immutable borrow is still live
    println!("{:?}", values);
}

The model’s intent is clear. It wants to check whether a key exists and insert a default if not. This is a standard pattern. What it failed to track is that values holds a reference into data, so the mutable borrow for insert conflicts with the immutable reference still held by values. The fix involves restructuring the control flow so the borrows do not overlap, or using the entry API:

fn update_and_read(data: &mut HashMap<String, Vec<i32>>, key: &str) {
    data.entry(key.to_string()).or_insert_with(|| vec![0]);
    println!("{:?}", data.get(key));
}

This is not a difficult fix once you understand what the borrow checker is enforcing. But the model’s original attempt failed not because it made a syntactic error. It failed because it did not maintain a consistent model of which references were live at which program points. That kind of reasoning, about the temporal and spatial validity of references across control flow, is precisely what the borrow checker formalizes. And it is the thing that next-token prediction does not learn from token co-occurrence statistics.

Lifetime annotations make this even clearer. When an LLM generates incorrect lifetime annotations in more complex positions, the error messages from rustc are specific and structured, but the corrections often miss the point:

// LLM-generated: compiles, but the lifetime constraint is more restrictive than necessary
struct Parser<'a> {
    input: &'a str,
    position: usize,
}

impl<'a> Parser<'a> {
    // Forces 'b to equal 'a unnecessarily
    fn current_token<'b>(&'b self) -> &'a str where 'b: 'a {
        &self.input[self.position..]
    }
}

The annotation compiles. Whether it correctly encodes the designer’s intent requires understanding what the function is supposed to mean. A model can generate lifetime annotations that satisfy the checker without having formed any intention about what the lifetimes should express. The types are correct; the contract was never written.

The Polonius Dimension

Matsakis has been one of the primary architects of Polonius, the next-generation borrow checker that models borrows using Datalog rather than the current region-based system. The Polonius work is relevant to the AI discussion in a way that goes beyond its technical details.

The existing borrow checker is conservative: it rejects some programs that are actually safe because its region inference cannot prove their validity. Polonius is more precise, accepting a strictly larger class of valid programs. For AI-generated code, this means some current compile errors on LLM output are false positives, code that would be accepted by a more precise checker. As Polonius ships and becomes stable, the surface of valid Rust expands, and some of what now looks like AI failure will turn out to have been checker conservatism.

But Polonius does not change the constructive theory’s concern. A larger acceptance set still requires that the accepted programs encode correct intent. Polonius accepting more programs means more programs can be written, not that the writers understood what they were writing.

Matsakis has written extensively about language design and ergonomics over the years, and one recurring theme is the tension between making correct programs easier to express and making incorrect programs harder to write accidentally. Improving LLM compatibility by making lifetime annotations less necessary is not obviously correct language design, because the lifetime annotations encode real information about program structure.

The unsafe Forcing Function

The cleanest illustration of the constructive theory in action is Rust’s unsafe keyword. The Rust reference defines a set of invariants that code in unsafe blocks must uphold: no undefined behavior, no invalid pointer dereferences, no data races, correct use of raw pointer provenance. The compiler cannot check these; the programmer must maintain them by reasoning.

unsafe is explicit by design. When you write unsafe { ... }, you are making a statement: I am taking responsibility for correctness here, and I know what I am doing. The type system’s job in this region shifts from enforcement to documentation. The unsafe block is a signed contract.

An LLM generating unsafe code can produce the signature but not the understanding. It can write unsafe { std::mem::transmute(x) } in a context where the transmute is undefined behavior due to layout assumptions that don’t hold, and the code compiles, and the error is latent. The Rustonomicon documents this at length precisely because the invariants are non-obvious and the failure mode is silent.

This is why the Rust project’s most conservative voices on AI tools tend to be in teams working on compiler infrastructure and the standard library. It is not irrational caution. It is the constructive theory applied consistently: types and unsafe boundaries encode intent, and intent cannot be generated.

What This Means for Language Design Going Forward

The survey’s most valuable output, beyond its specific findings, is the question it poses for language designers. If a type system’s job is defensive, then AI assistance that produces type-correct code is a productivity win with an acceptable residual risk profile. Review for correctness, iterate on failures, trust the compiler’s verdict.

If a type system’s job is constructive, then the evaluation changes. Code generated without intent is code with unknown semantics, even if it type-checks. The types document a contract that was never formed. This is a subtler and harder problem, and it does not have a tool-based solution. It has a review-based solution, which means it scales with human reviewer time, not with model capability.

Rust occupies a distinctive position in this debate because its type system is strong enough that the question is genuinely sharp. Python’s types are optional and frequently absent; the question barely arises. JavaScript has no real static semantics to speak of. C’s type system is too weak to encode meaningful ownership contracts. Rust’s type system is expressive enough to carry substantial intent, which means the question of whether that intent was present when the code was generated actually matters.

The Rust project is right to take the question seriously, and the survey is the right tool for it, precisely because there is no consensus. The defensive and constructive theories are both defensible, and different contributors live closer to one than the other based on what they work on. Mapping that disagreement, as Matsakis did, is more honest than forcing a resolution that would inevitably favor one group’s context over another’s.

Was this interesting?