· 6 min read ·

Proving TypeScript Correct: What LemmaScript and Dafny Actually Give You

Source: lobsters

TypeScript is very good at a specific kind of correctness. It catches structural mismatches, missing properties, wrong argument counts. It makes a large class of runtime errors visible at edit time without touching the semantics of your program. What it does not do, and was never designed to do, is prove that your functions satisfy behavioral contracts. That distinction matters more than it sounds, and LemmaScript is a serious attempt to close that gap by routing TypeScript verification through Dafny.

Understanding why this is interesting requires understanding what Dafny actually provides, which is different from what most developers mean when they say “type safety.”

What Dafny Gives You

Dafny is a verification-aware programming language developed by K. Rustan M. Leino at Microsoft Research. It has been around since roughly 2009 and has been used to verify real production systems, including parts of AWS’s S3 and EBS. The language compiles to C#, Java, Go, Python, and JavaScript, but its defining feature is not its output targets; it is the specification language that runs alongside the code.

A Dafny method can carry preconditions and postconditions:

method BinarySearch(arr: array<int>, target: int) returns (index: int)
  requires forall i, j :: 0 <= i < j < arr.Length ==> arr[i] <= arr[j]
  ensures index >= 0 ==> arr[index] == target
  ensures index < 0 ==> target !in arr[..]
{
  var lo, hi := 0, arr.Length;
  while lo < hi
    invariant 0 <= lo <= hi <= arr.Length
    invariant target !in arr[..lo] && target !in arr[hi..]
  {
    var mid := lo + (hi - lo) / 2;
    if arr[mid] < target { lo := mid + 1; }
    else if arr[mid] > target { hi := mid; }
    else { return mid; }
  }
  return -1;
}

The requires clause constrains what callers can pass. The ensures clause constrains what the function can return. The invariant inside the loop is a property that must hold before and after every iteration. Dafny feeds all of this to Boogie, a verification intermediate language, which in turn submits verification conditions to the Z3 SMT solver. If Z3 cannot find a counterexample, the function is verified. If it can, Dafny reports a specific line where the contract is violated.

This is a fundamentally different guarantee than what TypeScript provides. TypeScript tells you that a value has a certain shape. Dafny tells you that a function, given inputs satisfying its precondition, provably produces outputs satisfying its postcondition, for all possible inputs in that precondition’s range.

The Verification Gap in TypeScript

TypeScript’s type system is intentionally unsound. The design goals document explicitly lists “apply a sound or provably correct type system” as a non-goal, prioritizing practicality and compatibility with JavaScript idioms instead. This gives you a useful tool and a problematic gap.

Consider a simple function:

function divide(a: number, b: number): number {
  return a / b;
}

TypeScript accepts this without complaint. The type signature says it takes two numbers and returns a number. It cannot express “b must not be zero,” and even if it could with a nominal type, it cannot prove that callers satisfy the constraint. The type system gives you a structural contract; it cannot give you a behavioral proof.

Refinement types, which exist in languages like Liquid Haskell and F*, address this by embedding predicates in types. You can write something like {v: Int | v > 0} to denote a positive integer, and the type checker verifies that all values assigned to that type satisfy the predicate. There have been research efforts to bring this to TypeScript, including LiquidHaskell-inspired type refinements from UCSD, but none have become mainstream tooling.

LemmaScript takes a different architectural approach, acting as a translation layer rather than extending TypeScript’s type system directly.

The Toolchain Architecture

The key insight behind LemmaScript is that Dafny already compiles to JavaScript, which means there is a semantic target language shared between Dafny’s output and TypeScript’s runtime environment. Rather than building a new verifier from scratch, LemmaScript routes verification through Dafny’s existing pipeline: annotated TypeScript goes in, Dafny specifications are generated, Z3 checks them, and verified output comes out.

Annotations in LemmaScript follow a JSDoc-adjacent style, keeping them within TypeScript’s existing comment infrastructure and avoiding the need for a separate syntax extension:

/**
 * @requires b !== 0
 * @ensures result === a / b
 */
function divide(a: number, b: number): number {
  return a / b;
}

The toolchain parses these annotations, maps TypeScript types to Dafny types, and emits corresponding Dafny method signatures with the specified contracts. The Dafny verifier then checks whether the implementation satisfies those contracts. For pure functions over primitive types, this translation is relatively direct. For functions involving mutable state, object identity, or JavaScript’s dynamic features, the translation becomes significantly harder.

Where the Translation Gets Difficult

Dafny’s model of computation is fairly constrained compared to JavaScript’s. Dafny has a strong termination checker that requires loop invariants and decreases clauses for recursive functions; JavaScript’s prototype chain, closures, and runtime type coercions have no clean Dafny analogue.

The biggest friction point is TypeScript’s treatment of null and undefined, which appear everywhere in real codebases. Dafny uses option types explicitly (Option<T>) and does not permit null dereferences in verified code. Mapping TypeScript’s optional chaining and nullish coalescing operators to Dafny requires generating substantial scaffolding around what is, in TypeScript, a one-character operator.

Mutability presents a similar challenge. Dafny tracks heap modifications through a frame system: method specifications declare which memory locations they may modify via modifies clauses. TypeScript functions can freely close over mutable state, modify global objects, and produce side effects that Dafny’s frame system cannot easily capture without explicit modeling.

This is not a criticism specific to LemmaScript. Every formal verification effort for a dynamically-typed language faces the same translation problem. Gillian, a compositional symbolic execution framework from Imperial College, takes JavaScript as a target and contends with the same semantic gap. ESC/Java, which attempted extended static checking for Java in the early 2000s, encountered analogous difficulties mapping Java’s object model to formal specifications. The challenge is inherent to the domain.

Comparison with Other Verification Approaches

LemmaScript’s Dafny backend sits in a specific part of the verification landscape. It is more automated than proof assistants like Coq or Lean 4, where you construct proofs manually and the system checks them. It is more expressive than lightweight static analyzers like TypeScript’s own checker or ESLint plugins, which reason about types and patterns but not behavioral properties. It occupies the space of SMT-backed auto-verification, the same territory as Why3 for OCaml and Frama-C’s WP plugin for C.

The tradeoff in this space is well understood: SMT solvers are powerful but incomplete. Z3 can time out on complex specifications. It can fail to verify true properties if they require reasoning that Z3’s heuristics do not cover well. You write a postcondition, the verifier times out, and you are left debugging a proof rather than a program. This is not a failure of the tool; it is the nature of automated theorem proving. But it means the developer experience is meaningfully different from type checking, where failures are fast and the error messages are local.

The Practical Value

The cases where LemmaScript pays off most clearly are pure algorithmic functions with precise mathematical contracts: sorting routines, search functions, cryptographic primitives, parser combinators. These map cleanly to Dafny’s model, have obvious specifications, and are exactly the functions where subtle correctness bugs are most costly.

For a TypeScript codebase dominated by UI logic, API calls, and database interactions, the verification surface shrinks considerably. Much of that code is inherently about side effects that Dafny cannot reason about without explicit models of external systems. That is not a dealbreaker, but it means realistic adoption is probably at the library and utility layer first, not the application layer.

The broader point LemmaScript makes is about where the TypeScript ecosystem’s guarantees currently stop. Runtime validators like Zod tell you that an incoming value has the right shape. TypeScript’s type checker tells you that values flow through your code without structural mismatches. Dafny-backed verification tells you that specific functions satisfy precise behavioral contracts. These are three distinct levels of confidence, and until now the third level has been largely absent from mainstream TypeScript tooling.

LemmaScript is an early attempt to bring that third level into reach. The ergonomics are rough, the scope of verifiable code is narrow, and the Z3 timeout problem is real. But the project is pointing at something that matters: the difference between a type that passes and a function that is proven correct is not a small difference, and TypeScript’s ecosystem has mostly pretended it does not exist.

Was this interesting?