· 6 min read ·

Code Discovery as Infrastructure: The Navigation Problem Behind Large Monoliths

Source: lobsters

The 113 lessons collected in Isaac Lyman’s account of scaling a Rails monolith to one million lines cover an enormous range: database migration patterns, module boundaries, test strategy, hiring, and the full organizational arc from tech lead to CTO. Reading through them, several clusters connect to the same underlying problem: the codebase has grown too large for any individual to navigate confidently, and nobody has treated that as an infrastructure problem worth solving.

This is a different failure mode from the architectural ones that get most of the attention. Module coupling is an enforcement problem. Database performance is a discipline problem. The navigation problem is quieter. It accumulates as fear-driven conservatism: engineers add new code rather than modifying existing code, because modification requires understanding ownership that was never made explicit. Systems accrete without gaining clarity. The people who originally wrote a subsystem leave, and their implicit knowledge leaves with them.

At 80,000 lines and twenty engineers, informal ownership is workable. When someone touches the billing module, they find the original author on Slack. At one million lines and two hundred engineers, that author has long since moved to another company. Their knowledge is encoded in the code itself, and the quality of that encoding, along with the tooling built on top of it, determines how long it takes the next engineer to form a working mental model.

The CODEOWNERS Gap

GitHub’s CODEOWNERS file is the minimal viable intervention for ownership documentation. A flat-file mapping of directory paths to team handles, it wires required reviewers to paths in the repository and makes ownership explicit enough to query. The operational effect is straightforward: a PR touching app/services/billing/ automatically adds the billing team as a required reviewer, without anyone having to remember that connection.

# .github/CODEOWNERS
app/services/billing/       @company/billing-team
app/services/notifications/ @company/platform-team
lib/integrations/stripe/    @company/billing-team @company/security-team

The deeper effect is less obvious. When ownership is explicit and queryable, engineers can answer “who do I ask about this?” before they touch anything. Confidence scales with legibility. Code that belongs to an identifiable team gets modified more readily than code that belongs to nobody in particular; the former has a clear path to getting review, the latter means working in the dark hoping nobody objects later.

Shopify’s Packwerk approaches this from the enforcement angle: it prevents cross-package access that violates declared boundaries at CI time. CODEOWNERS and Packwerk solve adjacent problems. Packwerk prevents accidental coupling; CODEOWNERS routes accountability. A large codebase benefits from both, but they are often treated as alternatives when they are not substitutes for each other at all.

Code Search as Infrastructure

At ten thousand lines, grep is adequate for code navigation. At a million, grep is still technically functional and practically insufficient for what engineers need to do during routine work. The gap is not speed; it is semantic indexing.

Sourcegraph provides server-side code navigation across an entire repository, with cross-reference indexing that enables “find all references” and “go to definition” at scale without the multi-second delay that makes the operation feel too expensive to use. The difference between indexed cross-references and raw text search is visible in specific workflows: when tracing callers of a function to determine whether changing its signature is safe, a text search returns every line containing the function name, including comments, documentation, and unrelated files. An indexed cross-reference returns the actual call graph.

LSP-aware editors provide a subset of this locally, but LSP servers often struggle with large Rails codebases specifically because dynamic dispatch and meta-programming patterns common in Ruby make static analysis conservative. A call site that could dispatch to any class responding to a given method shows no results rather than partial results. The investment in dedicated code search infrastructure is partly compensating for something the language deliberately made hard to analyze statically.

For TypeScript and Go codebases, where type information is richer and static analysis more complete, IDE-level navigation handles more of the problem. But the underlying principle holds across stacks: the tooling you provide for code discovery directly affects whether engineers modify existing systems confidently or route around them by adding new ones.

Architecture Decision Records and the Memory Problem

Code encodes decisions but not reasoning. You can read a class and understand what it does. You cannot read it and understand why a specific approach was chosen over the alternatives that were also considered and rejected.

Architecture Decision Records (ADRs) are short documents that capture a decision at the time it was made: the context, the options that were evaluated, the choice made, and the consequences anticipated. The format is flexible; the key property is that ADRs live in the repository alongside the code they describe, versioned in git, discoverable through the same search tooling used to navigate everything else.

# docs/architecture/decisions/0042-background-job-strategy.md

## Status: Accepted

## Context
Sidekiq queues grew unbounded during traffic spikes, causing memory
pressure on workers. We evaluated priority queuing, separate worker
pools, and rate limiting at enqueue time.

## Decision
Separate worker pools for critical and non-critical jobs, sized
independently.

## Consequences
- Memory pressure from non-critical jobs no longer affects critical
  queue processing
- Deployment complexity increases: two worker configurations to maintain

The value of this record becomes concrete eighteen months later, when someone proposes consolidating the worker pools to simplify deployment. They find ADR 0042 and understand that consolidation was explicitly considered and rejected, and why. The constraint is legible, not just present. Without the record, the second engineer either repeats the same analysis at cost, or skips it and reverts a decision that took real effort to reach.

The ADR GitHub organization maintains several template formats, including Michael Nygard’s original lightweight structure and the more detailed Y-Statements format for decisions with more complex tradeoffs. Most teams find that consistency of location matters more than consistency of format; an ADR in an unexpected directory is rarely found.

The Compounding Return on Navigation Investment

These three interventions compound with each other and with the boundary and migration disciplines that get more attention. CODEOWNERS without module boundaries still lets coupling accumulate, just with known owners for the coupling. Module boundaries without code search leave engineers uncertain whether their change to a public interface will break anything they cannot easily find. ADRs without ownership documentation create records that nobody knows to look for when they encounter the constraint the record describes.

A team that invests in all three builds a codebase with a specific property: new engineers can contribute to unfamiliar areas in days rather than weeks. They read existing documentation, follow ownership to find the right reviewers, navigate code with tooling that respects semantic structure rather than just text, and find the reasoning behind constraints before they start working around them.

That confidence dividend is what separates a million-line codebase that continues to ship from one that calcifies. The calcification pattern is predictable. Fear-driven conservatism accumulates. Engineers write new code rather than improving existing code. Abstractions multiply rather than stabilizing. Teams develop deep expertise in their subsystems and thin tolerance for anything outside them. The codebase grows without becoming clearer.

Eventually, the motivation to migrate to microservices is not technical. It is the desire to escape a codebase that nobody can navigate, by drawing service boundaries that restore the clarity that module boundaries and ownership documentation would have provided. The distributed systems complexity that comes with that migration is a significant price to pay for the organizational problem of not having written down who owns what.

Lyman’s 113 lessons span technical and organizational ground in roughly equal measure, which itself is a meaningful signal. The technical disciplines are more satisfying to discuss because they come with concrete tooling and measurable outcomes. But the navigation infrastructure matters just as much, and it depends equally on both: Sourcegraph running against a repository with no ownership documentation is still better than grep, but less than half as useful as it would be alongside a maintained CODEOWNERS file and a set of ADRs that explains why things are the way they are.

Was this interesting?