· 7 min read ·

What Changes When Your Monolith Hits a Million Lines

Source: lobsters

The architecture conversation in software tends to cycle. Monoliths were the default, then microservices became the answer to every scaling problem, and now teams are publishing post-mortems about the distributed systems complexity they introduced without the scale to justify it.

The article from Semicolon and Sons, Scaling a Monolith to 1M LOC: 113 Pragmatic Lessons from Tech Lead to CTO, lands squarely in this context. It comes from someone who spent years in the same codebase watching it grow, making decisions about how to keep it manageable. The lessons aren’t theoretical. They accumulate from living with the consequences of earlier decisions.

What’s worth examining here isn’t just the lessons themselves, but what the premise reveals: a million lines of code is a lot, and most of what you learn at that scale doesn’t show up in conference talks about greenfield systems.

What Changes as a Codebase Grows

There’s a useful framing in Eric Evans’ domain-driven design: a system isn’t just code, it’s a set of domain concepts that accumulate over time. At 10,000 lines, you can hold most of the domain in your head. At 100,000, you can’t. At a million, even a senior engineer on the team won’t have read most of it.

This cognitive limit is the central engineering problem of large codebases, more than performance, deployment complexity, or test suite speed. The solutions people reach for, whether bounded contexts, module boundaries, strict ownership rules, or internal package APIs, are all attempts to manage what any individual needs to understand to contribute safely.

The tension is that the boundaries you draw early will calcify. A module boundary that was right at 50,000 lines may be wrong at 500,000. Refactoring across a million lines is not a weekend task, and the cost of a wrong boundary compounds every time an engineer works around it rather than through it.

The Modular Monolith as a First-Class Architecture

Before the microservices era, “monolith” was a neutral term. After the mid-2010s wave of microservices advocacy, it became a pejorative: a monolith meant a big ball of mud, tight coupling, inability to scale individual components, all the things that microservices promised to fix.

The problem is that microservices solved some of those problems while introducing others. Network latency between services, distributed transaction complexity, operational overhead for running dozens of containers, and the challenge of tracing a request across ten services all became real costs. Teams that had clear service boundaries, large enough teams to own them, and the operational maturity to run distributed systems generally thrived. Teams that adopted microservices as a complexity-management strategy, without those preconditions, often found themselves with the worst of both worlds: distributed state and a coupled domain.

The modular monolith emerged as a middle path. In Rails terms, this looks like Packwerk, the tool Shopify open-sourced in 2020 to enforce module boundaries at the package level within a monolith. In Java it resembles Project Jigsaw modules. In Go it looks like strictly layered packages with no circular imports enforced by go vet or custom lint rules. The deployment boundary stays single-process, but the logical boundary is enforced at the tooling level.

# Packwerk package.yml
enforce_dependencies: true
enforce_privacy: true
dependencies:
  - billing
  - shared/core

That constraint is significant in practice. Even though orders and billing live in the same process, orders cannot reach into billing’s internal classes. The only surface is the public API that billing exports. You get the operational simplicity of a monolith and some of the encapsulation that microservices provide. Shopify has written publicly about running a Rails monolith at multi-million-line scale using this approach; the tooling holds, but consistent enforcement determines whether the boundaries remain meaningful over time.

David Heinemeier Hansson coined the term Majestic Monolith in 2016, before Packwerk existed, arguing that most applications don’t need the operational complexity of microservices. The tooling eventually caught up with the philosophy.

The CTO View Versus the Tech Lead View

The framing of “Tech Lead to CTO” in the article title signals something important. The concerns at each level differ, and the architecture lessons that matter most shift as a result.

A tech lead at 50,000 lines cares about which patterns to use for new features, how to keep test coverage reasonable, and where the boundaries are between the team’s work and other teams’ work. The feedback loop is tight. Code reviews happen daily. The codebase is roughly legible.

A CTO at a million lines cares about how to make the codebase legible to engineers joining six months from now, how to avoid the architectural drift that makes large changes expensive five years from now, and how to balance the cost of enforcing standards against the cost of the inconsistency that accumulates without them. The feedback loop is measured in quarters, not days.

The lesson that tends to surprise people moving into technical leadership is that consistency compounds. An inconsistency at 50,000 lines is an annoyance. The same inconsistency, replicated by fifty engineers over three years, is a migration project. Standards that feel bureaucratic at small scale become structural at large scale, because the codebase is shaped by what engineers do by default, not by what the architecture document says.

Testing at Scale

Test suites are where large codebases often show their first serious symptoms. A suite that runs in two minutes at 10,000 lines may run in forty minutes at 100,000, and in three hours at a million. The instinct is to throw parallelism at the problem: split the suite across machines, run tests in parallel within a machine, use test impact analysis to run only the tests affected by a change.

These approaches help, but they also mask the underlying problem. Slow tests are often an architectural signal. If a test for a billing calculation requires spinning up the entire application stack because billing is tightly coupled to the authentication system, the test is slow because the code is coupled. Fixing the coupling fixes the test speed and improves the architecture. The parallel test runner just makes the problem cheaper to ignore.

The most useful discipline at large scale is the distinction between fast tests that run without external dependencies and slow tests that don’t. In a well-structured modular monolith, the core domain logic should be testable without a database, a network, or a running web server. If it isn’t, something has gone wrong at the architectural level.

# Fast: no database, no web server, no network
RSpec.describe Billing::Calculator do
  it "applies discount correctly" do
    result = described_class.new.calculate(base: 100, discount_pct: 10)
    expect(result.total).to eq(90)
  end
end

Gary Bernhardt’s Boundaries talk from 2012 covers this well: functional core, imperative shell. Keep the logic pure and fast, push the I/O to the edges. The principle applies to large codebases more forcefully than small ones, because at scale the cost of the slow path runs every time someone submits a pull request.

Deployment and the Cost of Batching Changes

One consistent lesson across large codebase post-mortems is the cost of infrequent deployment. When deployment is painful, teams deploy less often. When they deploy less often, changes batch up. When changes batch up, debugging a production incident becomes harder because any of fifty changes could be the cause.

The solution is making deployment cheap. That means feature flags for decoupling deployment from release, reliable rollback mechanisms, gradual rollouts, and monitoring that catches regressions quickly. At a million lines in a monolith, a broken deployment can affect the entire system, so the investment in deployment infrastructure pays off faster than it does at smaller scale. The DORA metrics research consistently shows deployment frequency as a leading indicator of software delivery performance; the metric is a proxy for how much friction the deployment pipeline introduces.

Feature flags deserve particular attention. They let you merge incomplete work to trunk without exposing it to users, which keeps branches short and reduces the merge complexity that accumulates when multiple engineers are working in the same areas of a large codebase. At a million lines, long-lived feature branches are expensive because they diverge from a moving target.

What Pragmatic Means in Practice

The “pragmatic” framing in the article title is worth sitting with. There’s a version of architecture advice that optimizes for elegance and theoretical correctness, for being able to describe the system cleanly at a whiteboard. There’s a different version that optimizes for the system in front of you, with the engineers you have, shipping the features the business needs, without introducing complexity that will cause problems two years from now.

Pragmatic lessons from a long-lived monolith tend to look like: don’t extract a service until the service boundary is obvious and stable; prefer boring technology for core infrastructure and save the interesting choices for the edges; make it easy for new engineers to be productive quickly, because onboarding friction compounds; treat the codebase as a long-term asset and make decisions accordingly.

These lessons don’t fit neatly on a conference slide. They accumulate from years of living with the consequences of earlier decisions, from watching an abstraction that seemed clever at 100,000 lines become a maintenance burden at 800,000. A collection of 113 of them from someone who went from tech lead to CTO in the same codebase is the kind of primary source that the architecture literature tends to underrepresent.

Teams that went all-in on microservices without the organizational scale to support them are consolidating services. The modular monolith is getting a serious second look. Collections of pragmatic lessons from long-tenured engineers in a single codebase are rare, because most engineers change jobs before they accumulate them. The article from Semicolon and Sons is that kind of document.

Was this interesting?