Layered Verification: What the Rust Project's AI Survey Implies About Tooling Investment
Source: lobsters
The Rust project’s survey on AI tool perspectives, summarized by Niko Matsakis, reveals something that the headline debate about whether LLMs can handle lifetimes tends to obscure. The divide between enthusiastic and skeptical contributors does not break along a single axis. It breaks along the axis of what verification happens downstream from the AI’s output.
Teams working on tooling, documentation, and migration infrastructure trust AI assistance more. Teams working on compiler internals and unsafe-heavy systems code trust it less. This is not a difference in taste or temperament; it reflects a real structural difference in the risk landscape. When the compiler validates your AI-generated output, you have a meaningful guarantee. When the work involves invariants the compiler cannot see, you do not.
That observation leads to a more productive question than the usual framing about whether LLMs are good or bad at Rust: what verification is available at each layer, and how should AI assistance be structured to take advantage of it?
The Compiler as a Floor, Not a Ceiling
rustc is an unusually powerful first-pass verifier for AI-generated code. A piece of AI-generated Rust that compiles without unsafe blocks is, by construction, free of use-after-free errors, dangling references, and data races. This is not the kind of assurance you get from compiling AI-generated C++ or running AI-generated Python through a linter. It is a formal guarantee backed by a type system specifically designed to encode these properties.
The important caveat is that rustc is a floor, not a ceiling. It does not check business logic. It does not verify that your async code is deadlock-free, or that your FFI wrapper correctly models the calling convention of the function on the other side. Safe Rust that compiles can still be wrong in a large number of ways.
The floor still matters, though. For the most common AI assistance tasks, generating boilerplate impl blocks, suggesting iterator chains, drafting trait implementations, the compiler’s verification catches a substantial fraction of the errors a model will produce. The generate-compile-correct loop that experienced Rust developers have converged on is more reliable than it would be in a language without this guarantee, because rustc’s error messages are specific, structured, and actionable.
rust-analyzer, the language server behind most Rust IDE integrations, extends the floor further. It provides type inference, trait resolution, and name analysis in real time, which means AI tools with LSP integration can query semantic information about the code being generated rather than treating it as raw text. A model that knows whether a type implements Send before generating a call across a thread boundary makes fewer errors than one reasoning from token patterns alone. Several AI coding tools have begun integrating language server protocols; for Rust specifically, rust-analyzer’s rich semantic model is one of the better foundations available in any language ecosystem.
The Middle Layer: MIRI and the Undefined Behavior Gap
The survey’s concern about unsafe code identifies the precise point where the compiler’s guarantees end. Inside unsafe blocks, Rust’s aliasing rules and memory model are enforced by convention, not by the type checker. Code that violates these rules, whether by misusing raw pointers, creating overlapping mutable references, or making incorrect layout assumptions with transmute, compiles cleanly and then produces undefined behavior when it runs.
MIRI, the MIR interpreter that ships as part of the Rust toolchain, addresses a significant portion of this gap. It executes Rust’s mid-level intermediate representation under a model that detects violations of the memory model, including violations inside unsafe blocks:
# Running MIRI over a test suite surfaces UB the compiler cannot see
cargo +nightly miri test
MIRI catches out-of-bounds memory accesses, reads of uninitialized memory, violations of the Stacked Borrows aliasing model, and many of the specific UB patterns that AI-generated unsafe code is likely to introduce. Stacked Borrows in particular is relevant here: it formalizes the aliasing rules that unsafe code must obey to preserve Rust’s memory safety guarantees, and LLMs generating unsafe code have no reliable model of those rules.
MIRI requires test coverage to exercise the code paths in question and runs significantly slower than native execution, so it belongs in development and CI workflows rather than production. For AI-generated FFI bindings or unsafe abstractions that already have test suites, running MIRI is a practical way to extend verification into territory the compiler cannot reach. An AI-generated wrapper that passes cargo test and cargo miri test is meaningfully more trustworthy than one that only passes the former.
Formal Verification for the Critical Layer
For code where neither the compiler nor MIRI provides sufficient confidence, formal verification tools represent the outer boundary of the stack. Kani, developed by Amazon Web Services and open-sourced, is a model checker for Rust that uses bounded model checking to prove properties about both safe and unsafe code. It operates on the same MIR that MIRI interprets, but instead of executing it against concrete inputs, it exhaustively explores the space of possible program states within a given bound:
#[kani::proof]
fn verify_no_overflow() {
let x: u8 = kani::any();
let y: u8 = kani::any();
kani::assume(x < 200 && y < 50);
let z = x + y; // Kani proves this cannot overflow under the assumption
assert!(z < 250);
}
Prusti, from ETH Zurich, approaches the same problem from a different direction. It is a program verifier that proves functional correctness properties expressed as preconditions, postconditions, and invariants written directly in the source code. Where Kani checks reachability properties, Prusti checks relational specifications about what functions compute.
Both tools require substantially more effort than running cargo test. Writing Kani harnesses or Prusti specifications is closer to formal methods work than to ordinary programming. For most AI-generated code, neither is necessary. But for a cryptographic primitive, a custom allocator, or a safety-critical embedded subsystem where the cost of a correctness failure is serious, these tools provide guarantees that no amount of testing can match, and they represent the appropriate response to AI-generated code in that tier.
The Verification Continuum and Where AI Fits
The point is not that every function needs formal verification. It is that the appropriate verification level is not binary; it runs from rustc through clippy through MIRI through Kani and Prusti, and where you land on that continuum should depend on what the code does and what the failure mode costs. AI-generated test infrastructure lives near the compiler end of the continuum. AI-generated cryptography lives near the other.
The survey’s division between enthusiastic and skeptical contributors maps onto this continuum almost mechanically. Contributors whose work lives near the compiler end of the stack, where a clean build and a green test suite cover most of the risk surface, find AI assistance valuable and relatively low-risk. Contributors whose work lives at the systems end, where correctness invariants live outside the type system and failure modes are severe, find it more dangerous. Both groups are right about their own contexts.
The implication for tooling investment is direct. Closing the gap between skeptics and enthusiasts requires tighter integration between AI coding tools and the verification stack, not a choice between them. AI tools that automatically run MIRI on generated unsafe blocks, that treat clippy warnings as generation feedback, and that use rust-analyzer semantic information to avoid type errors before they occur would shift more Rust code toward the part of the continuum where compiler guarantees cover most of the risk.
Some of this integration exists in early form. Several AI coding assistants surface compiler errors as feedback; a smaller number have begun integrating with language server protocols for richer semantic context. None of it is the default experience, and MIRI integration into the AI generation loop is essentially nonexistent in mainstream tooling as of 2026.
Why the Rust Ecosystem Is Well-Positioned to Build This
rustc, rust-analyzer, MIRI, cargo, and the formal verification tools all expose stable, well-documented interfaces. The rustc dev guide documents compiler internals in detail. The language server protocol integration through rust-analyzer is mature. Kani and Prusti both have programmatic APIs designed for integration. The tooling surface area for building tightly integrated AI-plus-verification workflows is larger in the Rust ecosystem than in almost any other language community.
That observation is worth sitting with. The conventional framing is that Rust is hard for AI because of the borrow checker. The less conventional framing is that Rust is unusually well-equipped to verify what AI produces, which means the expected value of AI assistance for Rust code can be higher than for languages with weaker static guarantees, given appropriate tooling.
The Rust project’s survey is valuable precisely because it makes the actual structure of the problem legible. Skepticism concentrates at the verification gaps; enthusiasm concentrates where verification is adequate. Following that signal toward better tooling integration is the straightforward next step, and the Rust ecosystem already has most of the components it needs.