· 6 min read ·

The Invariant You Left in the Comment

Source: lobsters

Most type systems carry less information than they could. A function signature that takes a String for an email address and an int for a user ID does tell you something, but the real constraints live elsewhere: in a comment above the function, in a runtime assertion inside it, or in tribal knowledge passed between developers. This article on theconsensus.dev puts a name to the skill of noticing that gap and closing it. The skill is real, and it is learnable, but it requires rethinking what a type is for.

The default mental model treats types as rough containers. int means a number. String means text. List<User> means some users. Within that model, constraints are expressed in prose: // userId must be positive, // list must not be empty before calling this. Those constraints are real. They represent knowledge the programmer has about the program’s invariants. They just happen to be stored somewhere the compiler cannot read.

Newtypes: The Smallest Step

The simplest correction is the newtype pattern, available in Haskell as a first-class language construct and in Rust as a tuple struct:

struct UserId(u64);
struct OrderId(u64);

Both wrap a u64. The runtime representation is identical. But now you cannot pass an OrderId to a function that expects a UserId. The compiler enforces the distinction without any overhead. This alone eliminates an entire family of bugs that come from mixing up identifiers of the same underlying type, which happens more often than anyone likes to admit in systems with many entity types.

Haskell takes this further with GeneralizedNewtypeDeriving, which lets the wrapper inherit typeclass instances from the inner type automatically. The Haskell newtype mechanism also guarantees zero runtime overhead at the language level, not just as a compiler optimization. Rust’s nutype crate adds proc-macro support for validated newtypes, so you can write #[nutype(validate(len_char_min = 1))] and get a type that is structurally a String but carries a compile-time guarantee that it was validated on construction.

The pattern extends naturally to domain units. Metres and Seconds both wrap f64, but you cannot add them together. The Mars Climate Orbiter mission failed partly because one subsystem produced pound-force seconds while another expected newton-seconds; both were just floating-point numbers at the type level. Newtypes do not eliminate all unit confusion, but they make the confusion visible at compile time rather than at orbital insertion.

Parse, Don’t Validate

Newtypes prevent mixing. The next step is using them to enforce constraints at system boundaries. Alexis King’s 2019 post “Parse, Don’t Validate” makes the clearest case for this I have read.

The argument is that functions which accept raw input should not return a boolean indicating whether the input is valid. They should transform the input into a richer type that embodies the validity proof:

-- Returns the proof, not a verdict about it
parseNonEmpty :: [a] -> Maybe (NonEmpty a)
parseNonEmpty []     = Nothing
parseNonEmpty (x:xs) = Just (x :| xs)

Once a caller holds a NonEmpty a, they cannot accidentally forget to check emptiness before calling head. The type makes the unsafe operation unavailable. King calls the alternative “shotgun parsing”: scattering validation checks throughout the codebase, trusting every call site to remember what every other call site already established.

This philosophy changes how you design module boundaries. The validation happens once, at the point where unstructured data enters the system. Everything downstream operates on types that already carry the guarantees baked in. In Rust, the zod equivalent is something like a private field with a public constructor function:

pub struct EmailAddr(String);

impl EmailAddr {
    pub fn parse(s: &str) -> Result<Self, ParseError> {
        if s.contains('@') {
            Ok(EmailAddr(s.to_owned()))
        } else {
            Err(ParseError::InvalidEmail)
        }
    }
    pub fn as_str(&self) -> &str { &self.0 }
}

The field is private. The only way to construct an EmailAddr is through parse. Every function that accepts EmailAddr knows the string was validated on creation, and no function needs to re-validate it.

Phantom Types and State Machines

Phantom types extend this idea from data constraints to state transitions. A phantom type parameter appears in the type signature but carries no runtime data. It exists to carry compile-time information about which state an object is in:

use std::marker::PhantomData;

struct Disconnected;
struct Connected;

struct Connection<State> {
    inner: TcpStream,
    _state: PhantomData<State>,
}

impl Connection<Disconnected> {
    pub fn connect(self, addr: &str) -> Result<Connection<Connected>, io::Error> {
        // ...
        Ok(Connection { inner: self.inner, _state: PhantomData })
    }
}

impl Connection<Connected> {
    pub fn send(&mut self, data: &[u8]) -> Result<(), io::Error> { ... }
    pub fn disconnect(self) -> Connection<Disconnected> { ... }
}

send is not defined on Connection<Disconnected>. The compiler enforces the state machine. This pattern, called the typestate pattern, is used throughout the embedded Rust ecosystem: the embedded-hal crate encodes GPIO pin direction as a phantom type parameter, so calling set_high() on an input pin is a compile error rather than a silent no-op or a runtime panic.

The phantom type approach requires no runtime overhead. The marker types like Disconnected and Connected are zero-sized; PhantomData is zero-sized. The state information exists only in the type system and is erased before code generation.

Refinement Types: The Research Frontier

All of the above deals with categorical properties: this is an email, that connection is open, this list is non-empty. Refinement types go further and allow predicates drawn from a richer logic:

{-@ type Pos = {v:Int | v > 0} @-}

{-@ divide :: Int -> {v:Int | v /= 0} -> Int @-}
divide :: Int -> Int -> Int
divide x y = x `div` y

Liquid Haskell uses an SMT solver (Z3) to discharge these proof obligations at compile time. The type system now expresses not just what kind of thing a value is, but properties that value must satisfy. Division by zero becomes a type error at call sites that cannot prove the denominator is nonzero.

Prusti, from ETH Zurich, brings similar capabilities to Rust through precondition and postcondition annotations that the Viper verification infrastructure checks statically. Microsoft Research’s F* language takes this all the way to verified cryptography: the HACL* library, used in Firefox and Signal, is verified at the type level using F*‘s refinement system.

The limitation is practical. Refinement type checking requires SMT calls, which slow compilation significantly. The specifications need to be written by the programmer, which takes time. And the proof obligations can become undecidable for sufficiently complex predicates. Refinement types are the right tool for security-critical code where the cost is justified; they are harder to justify for everyday application development.

The Adoption Gradient

Mainstream languages are moving toward this spectrum at different rates. Python’s NewType has been available since 3.10, and PEP 742’s TypeIs, added in Python 3.13, provides a narrowing predicate mechanism that directly addresses the parse-don’t-validate pattern: a function annotated TypeIs[T] tells type-checkers that a True return narrows the argument to T.

TypeScript’s structural type system makes nominal types awkward, but the community has settled on a branded types pattern: intersecting a base type with a unique symbol property to create a type that is structurally compatible but nominally distinct. The zod library integrates this natively, allowing schema definitions to produce branded types directly from parse calls. A long-standing TypeScript issue requesting language-level nominal typing has accumulated significant traction but remains open as of early 2026.

Go’s named types give you the distinction between type UserID int64 and type OrderID int64 for free, which is a modest version of newtypes. The language does not have phantom types or refinement types, and its culture favors runtime checks and error returns over compile-time encoding. That is a deliberate design position, not a gap waiting to be filled.

The Underlying Shift

The programmers who are good at seeing types where others don’t tend to ask a specific question when they encounter a function: what properties must the arguments satisfy, and where is that knowledge currently living? If the answer is “in a comment” or “in a runtime assertion inside the function,” that is the place to look for a type that is not yet written.

This is not about making every program maximally typed. Refinement types for an email validation function are overkill. But the question of where constraint knowledge lives, and whether the type system could carry it instead, is worth asking routinely. The comment that says // must be non-empty is evidence that the type is doing less work than it could. The if userId <= 0 { return error } check at the top of every function that takes a user ID is evidence that something failed to happen at the boundary where the ID was created.

The techniques exist across the full spectrum, from the zero-cost newtype to the SMT-backed refinement predicate. The skill is learning to see the gap, and then choosing the right tool for the cost the context justifies.

Was this interesting?