What Keeps a Monolith Alive at a Million Lines of Code

A hundred-and-thirteen numbered lessons from a single engineering career sounds like a listicle, but the Semicolon & Sons article earns that density. The author scaled a Rails monolith through an entire organizational arc, from early startup to a production codebase near a million lines, while ascending from tech lead to CTO. That dual vantage point is uncommon. Most architecture writing comes from people who made the microservices bet early and are defending it, or from people who stayed heads-down at one level of the org. The person who kept a monolith alive through both the code and the organizational structure has seen failure modes that neither group discusses.

The structural problem underlying most of those lessons is consistent. A monolith does not break because it gets large; it breaks because it loses internal structure, and then every module depends on every other module, and then no team can change anything without accidentally breaking something on the other side of the codebase.

The Boundary Problem

The companies that have kept large monoliths healthy share one consistent property: enforced internal module boundaries. Shopify’s Rails monolith reportedly crosses three million lines of code, and they built packwerk specifically to enforce those boundaries statically. packwerk makes cross-package dependencies explicit in a package.yml manifest and treats violations as CI failures. Without something equivalent, you get what architecture literature calls a “big ball of mud”: every class can reference every other class, business logic bleeds into controllers and background jobs, and the dependency graph becomes a tangle no single engineer can hold in their head.

Most languages give you the tools to enforce this. In Java and Kotlin, ArchUnit lets you write executable architecture tests that verify layering rules, package access, and naming conventions:

@Test
void services_should_not_depend_on_controllers() {
    noClasses().that().resideInAPackage("..service..")
        .should().dependOnClassesThat().resideInAPackage("..controller..")
        .check(importedClasses);
}

In Go, package-level visibility and explicit import paths create natural seams without additional tooling. In TypeScript, ESLint’s import/no-restricted-paths rule can prevent certain packages from importing from others. In Ruby without packwerk, Zeitwerk namespacing at least creates nominal structure, though it does nothing to prevent cross-namespace references.

The specific tool matters less than the principle: structural conventions enforced only by documentation and culture will erode under delivery pressure. A failing CI check is a boundary. A wiki page is not.

The Data Coupling Problem

A monolith has two dependency graphs. The code graph, which class calls which other class, is what packwerk and ArchUnit address. The data graph, which tables different domain objects read from, is invisible to most tools and compounds over time in ways the code graph does not.

A mature Rails application without disciplined data ownership will accumulate controllers that load data from eight or ten tables across unrelated domains in a single request, driven by what ActiveRecord made convenient rather than what domain boundaries suggested. The orders page pulls from billing tables. The user profile joins against permissions and activity logs. These cross-domain queries accumulate into load paths that become impossible to optimize without touching half the codebase.

Amazon Prime Video’s 2023 post about consolidating their distributed monitoring service into a monolith illustrates the inverse. Their pipeline had been decomposed into microservices, but the data coupling between components was inherently tight: each step needed to pass large media frame data to the next. The network serialization cost was enormous. Moving back to a monolith reduced infrastructure costs by 90% and improved scalability. The underlying lesson is that architecture boundaries should match the actual coupling characteristics of the data, not organizational preferences or the fashions of the moment.

Establishing data ownership per domain means treating cross-domain data access as an explicit design decision. When the Orders module needs a user’s display name, the options are: denormalize it into the orders table at write time, create an explicit read projection, or accept the join and name it deliberately. What accumulates into structural debt is the organic growth of cross-domain queries that nobody designed and nobody owns.

Test Suite Entropy

The signal that a monolith is losing the engineering battle is rarely an outage or a performance cliff. It is the CI duration chart in your project dashboard creeping from four minutes to twelve minutes to forty-five minutes over two years.

At a million lines, a test suite that takes forty minutes to complete means engineers either stop running it locally, killing the feedback loop, or stop writing tests, degrading coverage. Both outcomes compound and eventually manifest as production quality problems that get attributed to the monolith’s size rather than to test infrastructure neglect.

The remediation paths are well-understood. Parallel test execution (pytest-xdist, parallel_tests for Ruby, Gradle’s --parallel flag) distributes suite time across CPU cores or CI workers. Test sharding splits the suite across multiple CI jobs and reassembles results. Selective test execution, running only tests whose coverage intersects with changed files, reduces the local iteration loop to seconds. Hard CI duration limits, treated as breaking changes when exceeded, prevent the problem from compounding silently.

The fact that none of this is architecturally glamorous probably explains why it accumulates into crises rather than being addressed proactively. Test infrastructure is the kind of work that looks like maintenance until the day it determines whether the team can ship.

Ownership at Scale

At ten engineers, code ownership is informal and works. At fifty, informal ownership becomes a liability. Engineers do not know who to ask about a shared utility, do not know whether changing a helper method will break a team they have never spoken to, and default to fear-driven conservatism: adding new code rather than modifying existing code, because modification requires understanding ownership that was never documented.

GitHub’s CODEOWNERS file maps file paths to teams or individuals and wires required reviews to ownership boundaries. That is the enforcement layer. The ownership model itself requires deliberate decisions: where module boundaries sit, which team owns which domain, and what owning a module means in terms of review responsibility and on-call exposure.

The teams that scale monoliths well treat code ownership with the same formality that distributed systems organizations treat service ownership. The monolith shifts the boundary from network contracts to namespace conventions, but the organizational contract is the same. Someone is responsible for this code. Someone reviews changes to it. Someone answers the page when it breaks.

Database Migrations as a First-Class Concern

Schema migrations are where monolith discipline failures become visible in production. A large-table ALTER TABLE that adds a NOT NULL column without a default will lock reads and writes for minutes on PostgreSQL without the CONCURRENTLY option or a multi-step migration strategy. At scale, this kind of mistake causes outages.

The discipline here is treating every migration as a deployment in its own right. The standard pattern for zero-downtime migrations involves multiple deploys: add the column as nullable first, backfill data in batches, deploy code that writes to both old and new columns, deploy code that reads only from the new column, then add the constraint. Tools like strong_migrations for Rails automatically flag dangerous migration patterns in CI before they reach production.

The same principle applies to index creation. Adding an index to a large table without CREATE INDEX CONCURRENTLY in PostgreSQL takes a full table lock. These are not edge cases at a million lines of code; they are routine operations that require operational maturity.

When Service Extraction Makes Sense

Segment’s 2020 post about collapsing 140 microservices into a single macro-service is one of the cleaner accounts of what distributed system complexity costs in practice: harder debugging, serialization overhead, operational burden, and engineering time spent on infrastructure rather than product. The engineers who feel that cost most acutely are the mid-level and senior engineers running incident investigations across distributed traces rather than shipping features.

The Strangler Fig pattern remains the correct approach when extraction is genuinely warranted. The conditions that justify it are specific: a bounded context that needs an independent deployment cadence, materially different scaling characteristics, or a different operational profile. A media processing component that needs GPU instances while the rest of the application runs on general-purpose compute is a legitimate extraction candidate. A new team wanting clean ownership is a condition that should be addressed with better internal module structure, not a service boundary and an API contract.

“The monolith is large” is not a condition. Large, well-structured monoliths have served enormous traffic with small teams for years. Stack Overflow has handled hundreds of millions of requests per day on a handful of web servers with a small engineering team for over a decade. The scale was never the constraint.

The Accumulated Discipline

The value of the Semicolon & Sons article is that it collects the discipline behind keeping a monolith alive from someone who had to make these decisions across a full organizational arc, not just a single project or a single role. Most of the lessons reduce to a few structural commitments: draw module boundaries early, enforce them mechanically, keep test infrastructure treated as production infrastructure, establish explicit ownership before you need it, and approach database changes with the same care as code changes.

The architecture decisions are mostly made at the beginning. Everything after that is maintaining the discipline to honor them when delivery pressure makes shortcuts feel reasonable.