· 7 min read ·

What It Actually Means to Prove TypeScript Correct: LemmaScript and Dafny

Source: lobsters

TypeScript’s type system is Turing-complete at the type level, which sounds like a superpower until you examine what it actually buys you. Conditional types, mapped types, template literals, infer inside conditional branches: you can encode a surprising amount of structural knowledge. But there is a categorical boundary between “this value has the right shape” and “this function behaves correctly for all inputs,” and TypeScript cannot cross it by design.

LemmaScript is a verification toolchain that attempts to cross that boundary by routing TypeScript through Dafny, Microsoft Research’s verification-aware programming language. The idea is to keep writing TypeScript while leveraging Dafny’s proof infrastructure, backed by the Z3 SMT solver, to prove that specific functions satisfy their contracts.

To understand why this is interesting, you have to understand what Dafny actually is.

What Dafny Provides

Dafny was created by Rustan Leino at Microsoft Research and has been publicly available since around 2009. It is a programming language where formal specifications are first-class syntax, not comments or annotations that a linter might check. You write preconditions, postconditions, loop invariants, and termination measures directly in the language, and the compiler feeds them to Z3. If Z3 cannot construct a proof, the program does not pass verification.

A binary search in Dafny looks like this:

method BinarySearch(a: array<int>, key: int) returns (index: int)
  requires forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
  ensures 0 <= index < a.Length ==> a[index] == key
  ensures index < 0 ==> key !in a[..]
{
  var lo, hi := 0, a.Length;
  index := -1;
  while lo < hi
    invariant 0 <= lo <= hi <= a.Length
    invariant key !in a[..lo] && key !in a[hi..]
    decreases hi - lo
  {
    var mid := lo + (hi - lo) / 2;
    if a[mid] == key { index := mid; return; }
    else if a[mid] < key { lo := mid + 1; }
    else { hi := mid; }
  }
}

The requires clause is the precondition: the array must be sorted. The ensures clauses are the postconditions: if a non-negative index is returned, the element at that index equals the key; if -1 is returned, the key is provably absent from the array. The invariant annotations describe what holds at every loop iteration, and decreases hi - lo proves termination by providing a measure that strictly decreases with each step.

This is not a test. Z3 checks this for every possible array and every possible key satisfying the precondition, statically, before any code runs. The proof covers infinitely many inputs.

Dafny also has a construct called a lemma, which is where LemmaScript gets its name. A lemma is a method that exists purely in the proof layer. It produces no runtime output; it establishes a mathematical fact that other verified code can rely on:

lemma SortedArrayHasUniqueIndices(a: seq<int>, x: int)
  requires x in a
  requires forall i, j :: 0 <= i < j < |a| ==> a[i] < a[j]
  ensures exists! i :: 0 <= i < |a| && a[i] == x
{
  // Z3 discharges this automatically from strict sorting
}

The lemma vanishes at runtime. It is a compile-time artifact that strengthens the proofs of other functions that depend on this property.

The Gap in TypeScript

TypeScript’s design goals document explicitly lists “apply a sound or provably correct type system” as a non-goal. The language prioritizes practical compatibility with JavaScript over mathematical soundness. This is a reasonable choice for a language that has to work with millions of lines of existing JavaScript, but it leaves a meaningful hole.

Consider a sort function. TypeScript’s type system can express that it accepts T[] and returns T[]. It cannot express that the output is a permutation of the input, that no element was dropped, or that the output is actually sorted. You can document these properties in JSDoc; the compiler cannot verify them.

Runtime validation libraries like Zod and Valibot address input validation at system boundaries. Effect adds typed errors and dependency tracking. Property-based testing with fast-check generates thousands of inputs and checks behavioral properties. These all help. None of them produce proofs.

The closest prior art is Liquid Haskell, which extends Haskell’s type system with refinement predicates checked by an SMT solver. You can annotate a sort function to say its output is sorted and contains exactly the elements of the input, and the verifier checks this statically. Liquid Haskell works well partly because Haskell’s purity gives the solver a clean semantic model to reason over.

F*, developed at MSR and INRIA, offers a full dependent type theory with effects. It has been used in Project Everest to produce formally verified TLS implementations. The HACL* cryptographic library, verified with F*, ships in Firefox and Signal. For C, Frama-C and its WP plugin provide deductive verification via the ACSL annotation language.

None of this has historically touched the TypeScript ecosystem. LemmaScript is an attempt to change that.

The Translation Architecture

The key enabling fact for LemmaScript is that Dafny can compile to JavaScript and TypeScript. Dafny is not just a verification tool; it is a full programming language with multiple output backends. This means verified Dafny code can be compiled to TypeScript output and consumed directly in a TypeScript project without a runtime bridge.

LemmaScript’s approach is to let you write algorithmic logic in Dafny, verify it there, and then integrate the compiled TypeScript output with the rest of your application. Alternatively, you write TypeScript that satisfies contracts enforced by a Dafny verification layer. The toolchain handles the mechanical translation between TypeScript type annotations and Dafny specification syntax.

For pure functions over simple types, this translation is tractable. A function that takes numbers and returns a number maps cleanly. The harder cases are functions involving mutable state, closures, prototype chains, or JavaScript’s type coercions. Dafny’s heap model uses explicit modifies clauses to track what memory a method may change; JavaScript’s freedom with mutation has no direct analogue. LemmaScript necessarily targets a restricted subset of TypeScript where this translation is well-defined.

The Annotation Burden

Formal verification trades test effort for specification effort. Writing a correct postcondition for a non-trivial algorithm often requires as much thought as writing the algorithm itself. For the binary search above, a complete specification includes the sorted precondition, the correctness postcondition for the found case, and the completeness postcondition for the not-found case. Finding the right loop invariant, one strong enough for Z3 to propagate through the loop but not so strong that it is hard to establish initially, is typically the most time-consuming part.

This is not a flaw specific to LemmaScript or Dafny. It is inherent to SMT-backed verification. Prusti, a verifier for Rust based on the Viper framework from ETH Zurich, faces the same annotation burden. So does Creusot, another Rust verifier.

Z3 timeouts are also a real operational concern. Complex quantified formulas, common in specifications over arrays and sequences, can push Z3 past its default time budget. When this happens, you get neither a proof nor a counterexample; you get a timeout, and you have to debug the proof rather than the program. Dafny provides guidance on splitting obligations and providing hints, but this requires familiarity with how SMT solvers handle quantifiers.

Where This Fits in Practice

The honest comparison for LemmaScript is not with TypeScript’s type checker. The more useful comparison is with property-based testing via fast-check. Property testing also verifies behavioral invariants, generates many inputs, and checks that properties hold. It is accessible, integrates naturally into existing test suites, and catches the vast majority of real-world bugs. The difference is probabilistic versus exhaustive. A Dafny proof covers all inputs satisfying the precondition; fast-check covers the inputs the generator happened to produce.

For most application code, property testing is the right tool. API handlers, UI logic, and database interactions are dominated by side effects that Dafny cannot reason about without extensive external-system models. The verification surface for this code is small.

Where LemmaScript earns its cost is at the library and algorithm layer: sorting routines, binary search, cryptographic primitives, parser combinators, data structure invariants. These are pure or near-pure functions with mathematical specifications, exactly where SMT-backed verification performs well, and exactly where correctness bugs are most costly because they are foundational.

Dafny has been used on real production systems at this scale. AWS has published work on using Dafny to verify key-value storage components in S3. The technique is proven; the question is whether the TypeScript toolchain around it matures enough to be usable by developers who are not formal methods researchers.

The Verification Boundary Problem

One constraint that LemmaScript cannot fully solve is what happens at the boundary between verified and unverified code. A verified Dafny function, compiled to TypeScript, sits in a codebase of ordinary TypeScript. Calling that function from unverified code does not extend the proof across the boundary. The preconditions on the Dafny side are runtime checks at best; if the caller violates them, the verified guarantee means nothing.

This is a fundamental challenge for any effort to introduce formal verification into an existing codebase. seL4, the formally verified operating system kernel, faced the same issue: the C implementation is verified, but the assembly layer below it and the software above it are not. The approach there was to carefully audit the boundaries and treat them as trusted interfaces. LemmaScript’s practical adoption will depend on whether the TypeScript ecosystem develops similar discipline around these boundaries.

The project is early, the tooling is rough, and the subset of TypeScript that maps cleanly to Dafny is restricted. But the direction is sound. Bringing SMT-backed proof obligations into the TypeScript workflow, even partially, gives the ecosystem something it has not had: a path from “we tested this thoroughly” to “we proved this correct for all inputs.” Those are different things, and it is worth knowing both options exist.

Was this interesting?