· 6 min read ·

What TypeScript's Type System Cannot Prove, and What Dafny Can

Source: lobsters

TypeScript’s type system is genuinely good at what it does. It catches real bugs, documents intent, and makes large codebases navigable. But there is a category of errors it cannot touch: logical correctness. You can have a fully type-safe function that still returns wrong answers, violates invariants under certain inputs, or fails to terminate. LemmaScript, a verification toolchain built on top of Dafny, Microsoft Research’s verification-aware programming language, takes a different approach. It brings formal proof of correctness to TypeScript code by using Dafny as the verified source of truth and emitting JavaScript that TypeScript can consume.

What Dafny Actually Does

Dafny is a language designed from the ground up for program verification. It sits in a tradition going back to Hoare logic from 1969, where programs are annotated with preconditions, postconditions, and loop invariants that must logically hold. Dafny compiles these annotations into queries for Z3, Microsoft’s SMT (Satisfiability Modulo Theories) solver. Z3 either proves the property holds for all possible inputs, or produces a counterexample showing exactly where it fails.

A simple Dafny function looks like this:

method Divide(a: int, b: int) returns (result: int)
  requires b != 0
  ensures result * b == a || (a % b != 0 && result == a / b)
{
  return a / b;
}

The requires clause is a precondition: the caller must guarantee b != 0 before invoking this function. The ensures clause is a postcondition: the implementation must guarantee the stated relationship between inputs and outputs. If you write an implementation that can violate ensures for any input satisfying requires, the verifier refuses to compile. This check is static, at compile time, without running the program at all.

Beyond pre/postconditions, Dafny supports loop invariants (properties that hold before and after every iteration), termination proofs via decreases clauses that must shrink with each recursive call or loop step, and named lemmas: standalone proofs you invoke as dependencies in other proofs. The lemma system is where the LemmaScript name comes from.

The Gap TypeScript Cannot Cross

TypeScript’s type system works by tracking the shape and type of values through your program. It can prove that a value is string | null, that an object has a .length property, that a function returns Promise<User>. What it cannot prove is anything about the values themselves: that an array is sorted, that a counter is always non-negative, that a recursive function terminates, that two operations on shared state commute.

Consider a binary search implementation:

function binarySearch(arr: number[], target: number): number {
  let low = 0;
  let high = arr.length - 1;
  while (low <= high) {
    const mid = Math.floor((low + high) / 2);
    if (arr[mid] === target) return mid;
    if (arr[mid] < target) low = mid + 1;
    else high = mid - 1;
  }
  return -1;
}

TypeScript can tell you this returns number. It cannot tell you that the function terminates for all inputs, that the result is a valid index when it is not -1, that the loop invariant low <= mid < high is maintained, or that the algorithm is even correct when the input array is unsorted. These are semantic properties, not type properties, and no amount of generics or branded types fully captures them.

Runtime validation libraries like Zod or io-ts get you closer at system boundaries, but they operate at runtime on specific values, not statically across all possible inputs. They are useful for validation; they do not constitute proof.

How LemmaScript Bridges This

LemmaScript exploits a feature Dafny already has: it can compile to JavaScript. Dafny’s compiler targets multiple backends, including JavaScript, which means verified Dafny code can run in a Node.js or browser environment. The workflow LemmaScript enables is: write your logic in Dafny, verify it formally, emit JavaScript, and consume that JavaScript from TypeScript with accurate type definitions generated from the Dafny signatures.

The key insight is that Dafny’s type system is rich enough to express invariants TypeScript cannot, and its compiled output is ordinary JavaScript. If the Dafny verifier signs off on your code, you get a module with accompanying .d.ts declarations that TypeScript can statically check. The correctness proof lives in the Dafny source; the TypeScript only sees the surface API, already known to be correct by construction.

This is similar in spirit to what F* and Low* do for C: write verified code in the proof-aware language, extract the implementation, use it from C with the knowledge that the hard invariants were machine-checked. The extraction step decouples the verification burden from the consumption environment.

Prior Art and Alternatives

The general idea of using a verification language to generate correct code for another language has a long history. Ada/SPARK does this for safety-critical systems: SPARK is a formally verifiable subset of Ada, and SPARK-verified code interoperates with ordinary Ada. Coq via its extraction mechanism can produce verified OCaml or Haskell from proofs. Lean 4 is increasingly used this way, with active work on compiled output targeting multiple backends.

For TypeScript specifically, the landscape is sparse. Effekt and effect systems in general address side-effect control rather than functional correctness. There have been proposals to add refinement types directly to TypeScript, which would let you express things like { n: number; n > 0 }, but none have advanced into the language proper. Libraries like ts-belt push TypeScript’s type system hard with conditional types and template literal types, but they remain within the bounds of what the type checker can express structurally.

Some projects use Z3 directly from TypeScript via z3.js, which provides Z3 bindings for JavaScript and lets you write SMT queries programmatically. This is powerful but requires writing your logic twice: once in TypeScript and once as SMT assertions. LemmaScript’s approach is more integrated because Dafny is the single source of truth for both the logic and its verification.

The Real Trade-off

The friction here is substantial. Dafny is not TypeScript. Its syntax, idioms, and mental model are different from what most frontend or Node.js developers work with day to day. Writing a verified Dafny module requires understanding preconditions, postconditions, and in more complex cases, writing explicit lemmas to help the verifier understand why a property holds. The Z3 solver has a timeout and can fail to prove things that are true; you sometimes need to restructure your proof or add intermediate assertions to guide it toward a conclusion.

The payoff is proportional to how much your code’s correctness matters. For a CRUD API with straightforward business logic, the overhead likely is not worth it. For cryptographic code, consensus algorithms, financial calculations, or anything where a subtle bug has severe consequences, having the verifier certify your implementation is worth significant investment.

Dafny also has real limitations. Heap-manipulating code is harder to verify than purely functional code. Concurrency is not handled by the standard verifier at all; you need separate tools for that. And the compiled JavaScript output, while semantically correct with respect to Dafny’s model, is not optimized for the JavaScript runtime and can be verbose.

What This Points To

LemmaScript is a practical exploration of a question the TypeScript ecosystem has mostly deferred: what does it cost to actually know your code is correct, and what infrastructure makes that affordable? The answer in most languages has been too expensive for too long, but tooling has improved steadily. Dafny now has solid VS Code integration, better error messages, and a more stable compiler than it had five years ago. The JavaScript backend is mature enough to produce usable output.

The TypeScript ecosystem is large enough that even a narrow toolchain like LemmaScript can find a real niche. There are codebases, particularly in fintech, healthcare, and developer tooling, where the guarantees Dafny provides are worth the context switch. The bridge between the two languages is a reasonable place to start building that workflow, and having someone build the bridge at all is a meaningful step toward making formal verification a practical option rather than an academic curiosity.

Was this interesting?