· 6 min read ·

What TypeScript's Type System Cannot Prove, and How Dafny Might Fill the Gap

Source: lobsters

TypeScript’s type system is genuinely impressive. It handles generics, conditional types, mapped types, template literal types, and enough structural inference to make Java developers visibly uncomfortable. But there is a whole class of correctness properties it cannot express at all, and that class contains most of the bugs that actually matter in production.

A function typed as (xs: number[]) => number[] tells you nothing about whether the output is sorted, whether it preserves length, whether it modifies the input in place, or whether it terminates at all. Types describe shapes. They do not describe behavior. LemmaScript is an attempt to close that gap by threading Dafny’s verification infrastructure into the TypeScript toolchain.

What Dafny Actually Does

Dafny is a verification-aware programming language developed at Microsoft Research by Rustan Leino, now maintained by a team that has since moved to AWS. Its central idea is that correctness properties belong in the source code, written as first-class annotations that the compiler can check.

A Dafny method looks like this:

method BinarySearch(a: array<int>, key: int) returns (index: int)
  requires forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
  ensures 0 <= index < a.Length ==> a[index] == key
  ensures index < 0 ==> key !in a[..]
{
  // implementation
}

The requires clause is a precondition: it specifies what must be true before the method runs. The ensures clause is a postcondition: it specifies what the method guarantees about its output. These are not runtime assertions. They are checked statically by Z3, Microsoft’s SMT solver, before any code runs. If the verifier cannot prove that the implementation satisfies the postconditions given the preconditions, compilation fails.

Dafny also supports loop invariants, termination metrics (decreases clauses), and a special construct called a lemma: a ghost method with no runtime presence, used purely to guide the verifier through a proof that it cannot discover automatically. Lemmas are the mechanism for proving non-trivial mathematical properties about your data structures, which is also where LemmaScript gets its name.

Critically, Dafny has always had a compilation target for JavaScript and TypeScript. You can write Dafny code and emit TypeScript from it. LemmaScript appears to operate in the other direction: you write TypeScript and use the Dafny infrastructure to verify it.

The Gap That Needs Bridging

The challenge with bringing deductive verification to TypeScript is that TypeScript was not designed with a verifiable semantics. Structural typing, any, conditional types that depend on the shape of inferred type arguments, and the sheer permissiveness of the JavaScript runtime underneath all create significant obstacles for a formal model.

Rust’s Prusti avoided this problem by building on a language with a well-defined ownership model. Prusti lets you annotate Rust functions with preconditions and postconditions using a procedural macro syntax, translates those into the Viper intermediate verification language, and checks them against Z3. It works well in practice because Rust’s type system already eliminates aliasing, which is one of the hardest things to reason about formally.

SPARK does the same for Ada, and it has been used in genuinely safety-critical aerospace and cryptographic code. The seL4 microkernel verification is the most cited example of what full formal verification achieves: a mathematically proven absence of certain classes of bugs in hundreds of thousands of lines of C and Isabelle/HOL.

Liquid Haskell takes a different approach for Haskell. Rather than a separate verification language, it adds refinement types directly into Haskell’s type system. A type like {v: Int | v > 0} constrains v to positive integers, and the type checker uses an SMT solver to verify that all paths satisfy the constraint. This is elegant because verification and implementation live in the same syntax, but it requires that the host language’s semantics be clean enough to admit a formal model, which Haskell’s lazy evaluation and purity largely provide.

TypeScript sits in harder territory than any of these.

What a TypeScript Verification Bridge Looks Like

A reasonable approach, and one consistent with what LemmaScript appears to pursue, is to translate a verified subset of TypeScript into Dafny, check the translated code against Dafny specifications, and then compile or use the original TypeScript normally. The specifications might be expressed as JSDoc-style annotations, as decorator syntax, or as a parallel .dfy file that mirrors the TypeScript structure.

Something like:

// @requires xs.length > 0
// @ensures return >= Math.min(...xs)
// @ensures return <= Math.max(...xs)
function median(xs: number[]): number {
  const sorted = [...xs].sort((a, b) => a - b);
  const mid = Math.floor(sorted.length / 2);
  return sorted.length % 2 === 0
    ? (sorted[mid - 1] + sorted[mid]) / 2
    : sorted[mid];
}

The toolchain would parse the annotations, construct an equivalent Dafny representation of median, and ask Dafny’s verifier to check that the body satisfies the postconditions. If it cannot, you get a verification error before tests or deployment.

This kind of toolchain faces a few hard problems. TypeScript’s sort is not pure: it mutates the array in place. The spread operator copies the array first in this example, but the verifier needs to model that accurately or it will produce false results. Floating-point arithmetic is a particular pain point: Z3 has limited support for IEEE 754 semantics, and most verification tools either ignore floating-point subtleties or require the programmer to write proofs by hand using lemmas.

Where This Sits in the Verification Landscape

Formal verification toolchains exist on a spectrum. At one end you have type checkers, which verify structural properties automatically. At the other end you have interactive theorem provers like Coq or Isabelle, which can prove essentially anything but require significant mathematical effort from the programmer. SMT-backed tools like Dafny, Prusti, and Liquid Haskell occupy the middle: more expressive than types, more automated than interactive provers, but with real limitations on what they can discover without hints.

The practical question for LemmaScript is what the annotation burden looks like for ordinary TypeScript. Writing a correct Dafny proof for a sorting function is tractable. Writing one for a React component that manages asynchronous state is a different problem entirely. The async/await model, closures, and mutation patterns that dominate real TypeScript codebases are exactly the things that make formal reasoning hardest.

This suggests that the most immediate value is in the parts of the codebase where correctness is both important and bounded: pure utility functions, data structure invariants, protocol state machines, and cryptographic primitives. These are also the places where TypeScript developers currently reach for property-based testing frameworks like fast-check, which explores the space of inputs randomly rather than proving properties for all inputs. LemmaScript offers something stronger than property testing but more targeted than whole-program verification.

The Historical Pattern

Every mainstream language has eventually attracted a formal verification ecosystem. The path is usually the same: a research prototype demonstrates feasibility, a usable toolchain emerges for a narrow domain, and adoption spreads from safety-critical niches toward general use over a decade or more. Rust reached usable verification tooling faster than most, partly because its semantics were designed with reasoning in mind and partly because the systems programming community was already thinking about correctness.

TypeScript will follow a slower path because JavaScript semantics carry decades of decisions made without verification in mind. But the pressure is there. As TypeScript code runs in more critical infrastructure, the gap between structural type safety and behavioral correctness starts to cost real money and cause real failures.

LemmaScript is early, and the hard parts of the problem remain hard. But the existence of Dafny’s TypeScript backend means there is a real semantic bridge to build on, not just an analogy. The question is how much of the TypeScript surface area the toolchain can cover before annotation burden and verifier limitations push users back toward testing. That is a question worth watching.

Was this interesting?