Why Stripe Measures Dependencies Instead of Declaring Them

Coverage-based test selection is not a new idea, but applying it at the scale of a 50-million-line Ruby monorepo with hundreds of engineers committing daily is a different engineering problem than running pytest --testmon in a side project. Stripe’s post on their selective test execution system describes how they cut CI costs by running only the tests relevant to a given change, but the interesting part is not just the implementation. It is why the dynamic, coverage-based approach was the right choice for Ruby, and what that reveals about dependency analysis more broadly.

The Problem That Scales Poorly

A monorepo at Stripe’s scale has a CI problem that compounds with headcount. Every PR triggers a test run. If that run takes an hour and covers the entire suite, you pay the cost in wall-clock time, in compute, or in both. Parallelism helps but has diminishing returns and real cost. The obvious optimization is to not run tests that cannot possibly fail given a particular set of changes. If a PR only touches lib/payment/stripe_charge.rb, tests for the reporting dashboard or the fraud detection pipeline probably do not need to run. The challenge is figuring out, reliably, which tests are genuinely unaffected.

There are two broad approaches to this problem: static analysis and dynamic tracing.

Why Static Analysis Fails for Ruby

In a JavaScript or TypeScript monorepo, static analysis of test dependencies is tractable. Nx builds a project graph by parsing import statements and workspace configuration. Turborepo caches tasks based on declared inputs in turbo.json. Because TypeScript imports are explicit and the module system has well-defined semantics, you can construct a reliable dependency graph without running a single line of code. Jest’s --onlyChanged flag works similarly: it combines a git diff with a static traversal of the module graph to find affected tests.

Ruby is a different world. Consider what makes static dependency analysis unreliable in a typical Rails application.

Open classes. Any file can reopen any class at any point. A concern in app/concerns/ can reopen User. A gem initializer can add methods to String. The dependency graph has to account for the possibility that touching any file could affect the behavior of classes defined elsewhere.

Dynamic requires. require is a regular method call, not a compile-time directive. It can appear in conditionals, loops, and interpolated strings. Zeitwerk, the autoloader used in modern Rails, maps file paths to constant names by convention and defers loading until constants are first referenced. That deferral only happens at runtime.

Metaprogramming. method_missing, define_method, const_get, and eval are standard Ruby idioms, not edge cases. A static analyzer that sees User.send(:include, SomeModule) cannot know which methods are now available on User without executing that line.

Mixins. Including a module rewires the method lookup chain. A file that only defines a module can have deep effects on every class that includes it, and those relationships are discovered at object instantiation, not at parse time.

The practical consequence is that any static dependency graph for a large Ruby codebase will be either wildly over-approximate, running nearly all tests to stay safe, or dangerously under-approximate, missing real dependencies and producing false negatives. A team at Stripe’s scale cannot afford either.

Dynamic Tracing: The Map Built From Execution

Stripe’s approach flips the question. Instead of asking what a file depends on, they ask which tests have historically executed code in that file.

The mechanism is Ruby’s built-in Coverage module. When you call Coverage.start, Ruby begins recording which files are loaded and which lines are executed. Ruby 2.6 added oneshot_lines mode, which records only whether each line was hit at least once rather than counting every execution. For large test suites, this is meaningfully more efficient.

require 'coverage'

Coverage.start(oneshot_lines: true)

# require and run a single test file
require_relative 'test/payment/stripe_charge_test.rb'

result = Coverage.result
# => {
#      "lib/payment/stripe_charge.rb" => { oneshot_lines: [1, 5, 12, 23] },
#      "lib/models/customer.rb"       => { oneshot_lines: [3, 8] },
#      ...
#    }

Run this isolation harness for every test file, and you accumulate a map from each test to the set of source files it loaded or executed. Invert that map and you get the data structure that makes selective execution possible: source_file -> [test_a, test_b, test_c, ...].

When a PR changes lib/payment/stripe_charge.rb, you query the inverted index and get back every test that has ever loaded that file. Run those tests. Skip everything else.

This is empirical dependency analysis. You are not reasoning about what the code might do; you are recording what it did.

Staleness and the Correctness Trade-off

The obvious concern is staleness. The coverage map is built from historical test runs. If the codebase changes structure significantly, the map may not reflect the new reality until those tests run again.

Consider: if stripe_charge.rb starts requiring a new file lib/utils/currency.rb, tests that exercise stripe_charge.rb now implicitly depend on currency.rb. Until those tests run with the new code and update the coverage index, the map has a missing edge. A change to currency.rb might not trigger the right tests.

The mitigations are layered. The coverage map is updated continuously as CI runs complete, so it self-corrects quickly. Newly added test files are always run regardless of the index, since there is no prior data. Changes to infrastructure files like test helpers, factories, and configuration files can trigger broader runs based on heuristics. And post-merge CI provides a safety net: even if a stale map lets a failing test slip through a PR, the main branch catches it.

The system is optimistic rather than conservative. It accepts a small risk of occasional false negatives in exchange for dramatically lower CI cost on every run. For a team where the test suite is the primary quality signal and engineers can observe CI results quickly, this trade is reasonable.

How This Compares to Build System Approaches

Bazel and Buck take the opposite approach: explicit dependency declarations in BUILD files.

# BUILD file
ruby_library(
    name = "stripe_charge",
    srcs = ["stripe_charge.rb"],
    deps = [
        "//lib/utils:currency",
        "//lib/models:customer",
    ],
)

With explicit deps, the build system computes an exact dependency graph at analysis time. Affected test computation is precise and deterministic. No coverage tracing is required; no staleness risk exists. Google and Meta have operated at this level for years, and the tooling around it, including automated BUILD file generation and validation, is mature.

The cost is developer overhead. Every require addition needs a corresponding BUILD file update. At Google’s scale this has been normalized into tooling that generates or validates dep declarations automatically. For most teams, and especially for a codebase that predates those conventions by a decade, retrofitting explicit deps onto millions of lines of existing Ruby is not feasible.

Stripe’s coverage approach makes the opposite trade: zero developer overhead, empirically accurate data, some staleness risk. The coverage map self-maintains as tests run; engineers never have to think about it.

For Ruby specifically, there is also a correctness argument in favor of the dynamic approach that goes beyond practicality. Even if Stripe wanted to maintain BUILD files, those files could not capture every dependency that Ruby’s runtime creates. The dynamic approach captures what the static one cannot, by definition.

What This Means for Ruby CI in General

Systems like this typically produce dramatic results. Test impact analysis at Microsoft’s scale reportedly reduces test execution by 70-90% on typical changes. The underlying reason is that changes are usually local: most PRs touch a small fraction of the codebase, and most tests are irrelevant to any given change. A selective execution system turns that locality into a resource saving rather than an untapped opportunity.

For teams running large Ruby monorepos without anything like this, the path to implementation is clearer than it might seem. Ruby’s Coverage module is in the standard library. The collection logic, running tests in isolation with coverage enabled and storing the file-to-test mapping, is a few hundred lines of code. The harder work is in the infrastructure: storing and querying the index efficiently, keeping it fresh, handling edge cases correctly, and integrating with CI orchestration. Stripe’s engineering effort went into solving those problems at their scale.

The deeper lesson is about what dependency means in different programming environments. In a statically typed language with an explicit module system, dependencies are structural and knowable before execution. In Ruby, dependencies are behavioral and fully knowable only at runtime. Stripe’s test selection system is an acknowledgment of that reality, built into the foundation of their development workflow rather than worked around it.