The Tooling Dividend That Type Systems Don't Advertise

When Stripe’s engineers decided to invest in Sorbet, a gradual type system for Ruby, the stated reasons were the usual ones: catching type errors before runtime, enabling safer refactoring, making large codebases more navigable. Those reasons were valid. What the decision probably did not fully account for was that a sufficiently detailed type model of a program is also, structurally, a dependency graph, and dependency graphs are the foundation of fast CI.

Stripe’s recent post on selective test execution describes their system for running only the tests relevant to a given change in their 50-million-line Ruby monorepo. The core problem is building a dependency graph accurate enough to know, for any changed file, which tests could possibly fail. In a dynamically typed language like Ruby, this is not straightforward. For a Sorbet-annotated Ruby codebase, the type checker has already done most of the work.

This pattern appears across the type system landscape, though it rarely gets framed this way: type information, once collected for safety and correctness purposes, enables build tooling improvements that would otherwise require a separate, purpose-built analysis pass.

What Type Information Actually Gives You

The core operation in any build system or test selection system is: given that this file changed, what else might be affected? A file-level dependency graph answers with all files that import this one. A method-level dependency graph gives a narrower answer: all code that calls specific methods defined in this file.

The difference in precision matters considerably. A file-level graph treats payment.rb as a single dependency unit. If a test depends on payment.rb, any change to that file will trigger it, including changes to methods the test never calls. A method-level graph lets you track that a test depends on Payment#charge specifically. Changing Payment#format_currency, an unrelated method in the same file, does not affect that test.

Getting method-level precision from static analysis requires knowing, at the call site, which method is being called. In Ruby without Sorbet, a call to foo.bar could dispatch to a method defined in any file in the codebase, because open classes and module mixins mean the actual class of foo at runtime might not be what the source code suggests. With Sorbet type annotations, the type of foo is known at the call site, method resolution is deterministic, and the dependency edge is precise. Sorbet was built to catch type errors; the call graph it constructs for that purpose turns out to be directly useful for CI optimization at method granularity.

This gets more interesting with type narrowing. If Sorbet knows a variable has type CreditCard rather than the broader PaymentMethod interface, calls on that variable only depend on CreditCard’s methods, not every implementor of PaymentMethod. The dependency graph inherits the precision of the type system, and at Stripe’s scale, each narrowed edge represents tests that do not run.

The Pattern in Other Ecosystems

TypeScript demonstrates this dynamic most clearly in the JavaScript ecosystem. TypeScript’s --incremental flag and .tsbuildinfo files enable selective recompilation: only the compilation units whose inputs changed since the last build need to be rebuilt. The structural type information TypeScript tracks, which types are exported from which modules, how interfaces relate across files, makes it possible to determine that a change to moduleA.ts requires recompiling moduleB.ts but not moduleC.ts. The type checker was built to find type errors; the dependency precision it produces enables incremental compilation as a side effect. This is why tsc --incremental can cut cold-to-warm build times substantially on large TypeScript projects, without any new information that was not already implicit in the type annotations.

The Rust compiler’s fingerprint system in Cargo operates similarly. Every crate in a Rust project has explicit type boundaries enforced at compile time. When a crate’s types change in ways visible to dependents, those dependents must recompile. When a crate’s internal implementation changes without affecting its public API, dependents do not. The Rust type system’s strict visibility rules make this determination possible. cargo test can skip rebuilding crates whose transitive inputs have not changed, and the type system is what makes those input boundaries precise. The borrow checker and ownership model are the headline features; the accurate incremental compilation is the infrastructure dividend.

Go is a more limited case but still illustrative. The module system provides package-level isolation, and go list -test -deps ./... can construct a package dependency graph from source alone. Because Go has explicit imports with no dynamic loading and no open classes, the dependency graph is accurate and complete from static analysis. This is what enables Bazel’s gazelle to automatically generate BUILD files for Go codebases, and what makes go test ./... with build caching genuinely useful for affected-only testing. Go’s constraints, explicit imports and no metaprogramming, are language design decisions, and they directly enable better tooling without additional annotation work.

Java has had incremental compilation in Eclipse JDT since the early 2000s. The Java compiler’s type model tracks which classes implement which interfaces, which methods are overridden, and which callers might be affected by a change to a method signature. Gradle’s incremental task execution and test caching depend on the compiler producing accurate information about what changed between compilations. The JVM’s explicit class and interface system makes this tractable in ways that a dynamic language cannot match without additional instrumentation.

Why Ruby Makes This Hard Without Sorbet

Ruby’s object model is the opposite of what efficient static dependency analysis requires. Classes can be reopened in any file. Methods can be added to any class at runtime. Modules can be included conditionally. Zeitwerk, the autoloader used in modern Rails, maps filesystem paths to constants by naming convention and loads files lazily on first reference. None of this is visible to a static analyzer reading source files.

For Ruby codebases without type annotations, the practical option is coverage-based test selection: run every test with Ruby’s built-in Coverage module enabled, record which source files each test loaded, build an inverted index from source files to test files, and consult that index on subsequent PRs. The crystalball gem by Toptal implements this for Ruby; pytest-testmon takes the same approach for Python, which faces a similar analysis problem. The accuracy is empirically high because it records actual runtime behavior rather than reasoning about what might happen.

Coverage-based selection has a ceiling, though. The dependency map reflects the last time tests ran; new dependencies introduced after the map was built are invisible until the map refreshes. Coverage instrumentation carries runtime overhead, typically 20-40% slowdown on MRI Ruby even in the more efficient oneshot_lines mode added in Ruby 2.6. And file-level granularity is the best coverage-based approaches achieve without significant additional analysis.

require 'coverage'
Coverage.start(oneshot_lines: true)

# run a single test file
require_relative 'test/payment/stripe_charge_test.rb'

result = Coverage.result
# => {
#   "lib/payment/stripe_charge.rb" => { oneshot_lines: [1, 5, 12, 23] },
#   "lib/models/customer.rb"       => { oneshot_lines: [3, 8] },
# }

Sorbet pushes past that ceiling. A # typed: strict file gives the type checker enough information to resolve method dispatch statically, enabling method-granularity dependency tracking rather than file-granularity. The coverage map fills in where types are absent. The two approaches are complementary, and the precision of the hybrid increases with the percentage of the codebase that Sorbet annotates. More coverage of the type system means fewer edges that need to be discovered empirically, fewer staleness windows, and more aggressive test exclusion with the same correctness guarantee.

The ROI Calculation That Gets Missed

Type system adoption decisions are usually evaluated on developer experience grounds: fewer runtime bugs, better IDE completeness, safer large-scale refactoring. These are real benefits, and they were the arguments that justified Sorbet’s development at Stripe starting around 2017, well before the test selection use case was a stated goal.

The build infrastructure dividend is rarely part of that initial argument because it is less predictable and requires additional tooling investment to realize. You have to build the system that consumes the type graph for dependency analysis; the type graph alone is not enough. But the capability compounds: a type system that achieves 60% annotation coverage gives you method-granularity precision for 60% of your dependency edges. At 80% coverage, the selective execution system becomes substantially more aggressive because fewer edges require coverage-based fallback.

For a monorepo at Stripe’s scale, that difference in precision translates directly to compute cost and developer feedback time. CI runs that would otherwise take an hour complete in minutes for the typical PR touching a small fraction of the codebase. The feedback loop tightens, which encourages smaller, more focused commits, which affect fewer files, which trigger fewer tests. The system becomes self-reinforcing when commit granularity aligns with test impact granularity.

For teams evaluating whether to adopt a gradual type system like Sorbet, or whether to continue increasing type coverage for a partially annotated codebase, the test infrastructure angle is a real input to the decision. The argument is not that type safety is good therefore do it; the argument is that type infrastructure, once built, tends to pay dividends in directions that were not the original goal, and those dividends compound in proportion to how thoroughly the type information covers the codebase.