What Actually Breaks When a Monolith Reaches a Million Lines of Code

Josh Lewis at Semicolon & Sons published a dense list of 113 lessons drawn from growing a codebase to a million lines while moving from tech lead to CTO. It covers ground from database design to team structure to the psychological side of technical leadership. Worth reading in full. What I want to do here is pull on a few threads that don’t get enough attention in the usual discourse around large monoliths, because the thing that always surprises people is that it’s not the architecture that fails first.

The Wrong Failure Mode Gets All the Attention

The standard story about monolith scaling goes like this: you start with a Rails app or a Django app or a Spring Boot application, it grows, deployments get slow, one team’s changes break another team’s features, the database becomes a global bottleneck, and you migrate to microservices. This story is true often enough that it gets repeated constantly. What it misses is the ordering.

The architecture, meaning the fact that all the code lives in one process and one repository, is rarely the first thing that causes pain. What breaks first is discipline. At 50,000 lines, a single developer can hold the whole system in their head and route around bad abstractions. At 500,000 lines, you need actual module boundaries, enforced naming conventions, documented ownership, and a build system that can tell you what changed and what it affects. Most teams don’t have those things and don’t notice they’re missing until the codebase is already large enough that adding them is hard.

Stack Overflow famously ran on a small cluster of SQL Server instances and a monolithic .NET application serving hundreds of millions of requests per month for years. Shopify ran on a Rails monolith until their codebase was enormous, and when they did decompose, they did it into component-based modules within the monolith first, before extracting services. Basecamp has been explicit for years that they consider microservices an organizational solution masquerading as a technical one, and that most teams reach for them too early. The evidence suggests that a well-maintained monolith can outlast a lot of poorly-maintained microservices architectures.

The Database Is the Real First Failure Mode

When a large monolith does hit a genuine technical wall, it’s almost always the database. This is worth being concrete about, because “the database is a bottleneck” is vague enough to be useless.

The specific failure modes are: missing indexes on columns that get added incrementally without query analysis, N+1 query patterns that are invisible at low load and catastrophic at high load, schema migrations that lock tables for long enough to cause cascading timeouts, and connection pool exhaustion from too many app servers each holding open connections.

None of these are architecture problems in the sense that microservices would solve them. A microservices architecture actually makes the N+1 problem worse, because network calls replace in-process function calls and the latency multiplies. The real discipline is instrumentation: you need query-level performance monitoring from early on, tools like pg_stat_statements in PostgreSQL or the slow query log in MySQL, and a habit of reviewing execution plans before merging anything that touches the data layer.

Large-scale zero-downtime migrations are their own discipline. The standard pattern for a relational database migration that can’t lock a table is: add the new column with a default value and index, start writing to both old and new columns, backfill the old rows in batches, switch reads to the new column, drop the old one. This is tedious and requires tooling to make consistent. GitHub open-sourced gh-ost for online MySQL schema changes. There are similar tools for Postgres. Teams that aren’t running these patterns by the time their largest tables hit a hundred million rows are accumulating a specific kind of debt that’s painful to address later.

Module Boundaries Are Where You Either Win or Lose

The structural discipline that separates a maintainable 1M LOC codebase from an unmaintainable one is module boundaries. Not microservices; actual enforced boundaries within the monolith.

In a Ruby or Python codebase, this means namespace conventions and load path discipline. In Go, it means package structure and the deliberate use of internal packages to prevent external imports. In Java and Kotlin, it’s modules in the Java Platform Module System sense or just disciplined package boundaries with visibility rules. In C and C++, it’s header boundaries and link-time dependency control.

The reason this matters is that without enforced boundaries, the codebase develops what some call “the big ball of mud” property: any code can call any other code, circular dependencies accumulate, and the logical modules you thought you had become fiction. Changes propagate through the codebase in ways that are hard to predict, test coverage becomes inadequate because the call graph is too complex to reason about, and onboarding new developers gets harder with every passing month.

Shopify’s approach to this is documented in their component-based Rails architecture work. They used the packwerk gem to enforce package boundaries within the Rails monolith, defining which parts of the codebase could call into which other parts and failing CI builds when those boundaries were violated. This is the right instinct: the enforcement mechanism matters as much as the boundary definition, because humans under deadline pressure will violate soft conventions.

For teams on other stacks, ArchUnit serves a similar role for JVM projects, and tools like dependency-cruiser can do this for JavaScript and TypeScript projects. The specific tool matters less than having one.

What Happens to Build Times and CI at Scale

A monolith that takes 45 minutes to fully build and test is a monolith that developers learn to work around. They start skipping tests locally, running only the tests for the module they changed, and relying on CI to catch problems. This is fine until CI queues start backing up and the feedback loop from a push to a green build stretches to an hour or more.

The standard responses are: better caching in CI, test parallelization, and incremental builds where only the tests affected by a change run. The last of these is the most valuable and the hardest to get right.

Bazel and its derivatives (Buck2, Pants) solve this with hermetic build rules and fine-grained dependency graphs, so the build system can prove which test targets are affected by a given change. This is a significant investment to adopt on an existing codebase. The more pragmatic approach for most teams is a combination of good build caching and tools that analyze test coverage to determine which tests are relevant to changed code, like Buildkite’s test impact analysis or Trunk.io’s flaky test detection and impact tooling.

The underlying point is that build time is not a vanity metric. It’s a direct multiplier on every developer’s throughput and on the latency of your feedback loops. Teams that track it and budget for improvements treat it as the infrastructure problem it is.

The Cognitive Load Problem

A million lines of code doesn’t fit in any human’s head. This sounds obvious but its implications are underappreciated. The question is not whether individuals can know the whole codebase (they can’t) but whether they can navigate it efficiently enough to work productively in areas they’re unfamiliar with.

Code search tooling matters enormously here. Sourcegraph or a well-configured ripgrep workflow, combined with a language server that can resolve cross-file references, is the difference between a developer who can answer their own questions about unfamiliar code in minutes and one who has to interrupt colleagues constantly. The go to definition and find all references capabilities in modern LSP-aware editors are table stakes; what scales them up is server-side indexing so they work instantly across the entire repository rather than needing a warm local index.

Code ownership is the other side of this. When a file or module has no clear owner, questions about it go unanswered or get answered inconsistently. GitHub’s CODEOWNERS file is a lightweight mechanism for this. More important than the tool is the practice: ownership should be legible to anyone reading the codebase, and it should correspond to actual responsibility for quality and for answering questions.

When Decomposition Is the Right Answer

None of the above means monoliths are always the right choice. The case for extracting services is strongest when different parts of the system have genuinely different scaling requirements, different deployment cadences, or different teams that need to be able to release independently without coordinating.

The last of these is the most common real reason for microservices adoption, and it’s honest: Conway’s Law is real. Organizations with strong team boundaries tend to build systems that reflect those boundaries. If two teams genuinely cannot coordinate their releases because of organizational structure, a hard service boundary between their domains may be the path of least resistance.

But the discipline required to run microservices well, distributed tracing, service meshes or explicit network policy, contract testing between services, careful handling of partial failure and eventual consistency, is significant. Teams that underestimate it end up with a distributed monolith: all the coupling of a monolith plus all the operational complexity of microservices. The 113 lessons in Josh Lewis’s article are largely applicable regardless of which architecture you end up in; the database discipline, the module boundaries, the build time attention, the code ownership, these all transfer. The architecture decision is downstream of whether you have those habits in place.

Practical Starting Points

If you’re working in a monolith that’s growing and feeling the pressure, the most valuable things to do first are concrete and bounded:

Add query-level instrumentation now, before you need it. In Rails, Marginalia attaches comments to SQL queries so you can correlate slow queries back to application code. In other frameworks, the ORM usually has hooks for this.
Set up a module boundary tool appropriate to your stack and define the boundaries you want. Even defining them and not enforcing them yet gives you a document of intent that’s useful.
Measure CI build time end-to-end and track it as a metric. Set a threshold that triggers a conversation.
Make code ownership explicit, even if it’s just a text file to start.

None of these require architectural changes. They’re the maintenance practices that determine whether a monolith is still viable at 500K lines versus one that’s already past the point of no return at 200K. The architecture is downstream of the habits.