The Engineering Discipline That Keeps Million-Line Monoliths Alive

The default assumption in most architectural discussions is that large codebases eventually become microservices. When a team grows past a certain headcount, or a codebase crosses a certain size, the conventional wisdom says you decompose into services. A recent piece from semicolonandsons challenges this by cataloguing 113 lessons from growing a monolith to one million lines of code, from the perspective of someone who made the journey from tech lead to CTO. It is rare because it documents what most engineering teams quietly do but rarely write down.

The post’s value isn’t that each individual lesson is novel; several are things you’d find scattered across Stack Overflow answers or Rails guides. The value is that they coexist. Seeing “keep your controllers thin” next to advice about database migration strategy next to observations about team communication creates a picture of what a disciplined monolith looks like across every layer simultaneously. What’s worth exploring further is the structural reason these lessons cluster together, grounded in the specific tooling and real-world examples that a list format can’t accommodate.

The Module Boundary Problem

The central technical challenge for any codebase approaching a million lines is enforcing boundaries between components without separate deployments. Microservices solve this with network calls; you physically cannot call a private function in another service. In a monolith, you solve it with conventions and tooling, and the tooling is what separates a well-organized large application from a codebase where every class can reach every other class given enough deadline pressure.

Shopify runs what is probably the most documented large Rails monolith in existence, measured in millions of LOC rather than hundreds of thousands. Their approach to this boundary problem is packwerk, a static analysis gem that enforces package boundaries without requiring separate processes. Each directory can declare itself a package with a package.yml file, specify what it exposes publicly, and enumerate which other packages it may depend on. Running packwerk check in CI catches boundary violations before they merge.

# components/orders/package.yml
enforce_dependencies: true
enforce_privacy: true

With enforce_privacy: true, any class in components/orders that isn’t explicitly placed in a public/ subdirectory is treated as package-private. Another package importing Orders::InternalHelper directly gets a packwerk violation. This is the same boundary enforcement you’d normally get from a service’s API contract, implemented as a static analysis rule instead of a deployment boundary.

Rails engines offer an older, heavier mechanism for the same goal. An engine is a miniature Rails application that can be mounted into a parent app, with its own models, controllers, and migrations. The overhead of engine configuration makes them better suited for large, stable subsystems than for granular component-level boundaries, which is part of why packwerk filled a gap despite engines existing since Rails 3.

The lesson from both approaches is the same one the semicolonandsons piece keeps returning to in different forms: good organization requires enforcement. Conventions without tooling degrade under deadline pressure. The code that crosses a module boundary “just this once” becomes the pattern everyone else follows when they see it in the history.

Database Discipline

At a million lines, the database is almost always the first thing that degrades under load, and it degrades in specific, predictable ways. The two largest sources of problems in mature Rails codebases are N+1 queries and transaction scope that has grown beyond what was originally intended.

N+1 queries are easy to introduce in a large codebase because any call to an association can trigger one if it wasn’t eager-loaded. Rails 6.1 introduced strict_loading on models and associations, which raises an error on lazy loads rather than silently executing them:

class Order < ApplicationRecord
  self.strict_loading_by_default = true
end

With this set globally, code that lazy-loads an association in a context that didn’t eager-load it raises ActiveRecord::StrictLoadingViolationError. The bullet gem takes a softer approach, logging or notifying on N+1 patterns without raising, which is more practical for existing large codebases where the strict mode would produce too much noise to enable globally from day one.

Transaction scope is a subtler problem. Service objects and domain logic in large codebases tend to accumulate database calls, and developers wrap uncertain sections in transactions as a correctness measure. This becomes a performance problem because long-running transactions hold database locks, and at scale those locks cause cascading slowdowns. The discipline of keeping transactions short, and specifically the habit of moving side effects (email sends, webhook dispatches, external API calls) outside transaction blocks, has to be enforced by convention. Rails will not warn you when you’ve put a Stripe API call inside a transaction block.

Schema migrations introduce a third category. Long-running migrations on large tables block reads or writes depending on the operation and the Postgres version. The strong_migrations gem catches dangerous migration patterns before they execute, flagging operations like adding a column with a default value on a large table (which in older Postgres versions rewrites the entire table) or removing a column without first teaching the application to ignore it.

The Testing Surface Problem

A million-line codebase with a comprehensive test suite routinely produces forty-minute CI runs. The standard test pyramid advice stays correct at this scale, but the distribution matters more than it does at smaller sizes. Broad integration tests that boot the full Rails stack and exercise multiple layers are cheap to write and expensive to maintain; they accumulate naturally because they feel thorough at the moment of writing.

The tooling response is aggressive parallel test execution. Both parallel_tests and Rails’ built-in parallel testing support help, though only up to the point where test database setup and teardown become the bottleneck rather than the test execution itself. Flaky tests compound badly with parallelism: a test that fails 10% of the time independently will fail in nearly every CI run when hundreds of tests execute concurrently across multiple workers.

The more difficult discipline is deleting tests that cover implementation details rather than behavior, and resisting the impulse to test every failure path at the integration layer when a unit test covering the failure logic would run in milliseconds instead of seconds. This kind of curation requires ongoing investment; test suites don’t stay lean on their own.

Deployment Mechanics

The argument for service decomposition often includes independent deployment of components. A monolith deploys as a unit, which at a million lines means deploying a large unit. Making that safe requires its own practices.

The two primary tools are feature flags and zero-downtime migration discipline. Flipper is the standard Rails library for feature flags, supporting gradual percentage rollouts and actor-based targeting. The practice of shipping code behind a flag, enabling it for internal users, monitoring metrics, and then expanding to production incrementally is how large monoliths achieve deployment safety without service isolation boundaries.

Zero-downtime migrations require a specific three-phase sequence: first deploy the code that handles both the current and new schema states, then run the migration, then deploy the code that removes the old compatibility handling. This pattern, sometimes called expand and contract, is well documented but requires consistent discipline across a team. Strong_migrations enforces the migration side; the code compatibility work is harder to lint.

The Organizational Mirror

Conway’s Law observes that systems tend to mirror the communication structures of the organizations that build them. At a million lines, the relationship is bidirectional: the existing code structure influences which teams form and which teams need to coordinate, and team boundaries harden into module boundaries regardless of whether anyone intended them to.

The lessons in the semicolonandsons piece that touch on communication and team structure are not conceptually separate from the technical lessons. A module boundary that’s hard to cross in the code creates friction that teams learn to route around by aligning their work to it, which reinforces the boundary. This is how large codebases develop their characteristic sectors that newer engineers navigate more by social knowledge than by documentation.

The practical implication is that module restructuring is partly an organizational project. Packwerk boundaries that don’t match team responsibilities will generate a steady stream of violations, because teams will naturally reach into neighboring code to do their work. The tooling and the team structure have to evolve together, which is a point that purely technical architectural advice rarely acknowledges explicitly.

What This Kind of Document Is For

The semicolonandsons piece is valuable not because each lesson is secret knowledge, but because it maps the full surface area of running a large monolith with intention. Most of this knowledge exists in scattered form: in gem READMEs, conference talks, individual post-mortems. Seeing it assembled by someone who held responsibility for the same system at multiple levels of seniority gives it a coherence that the scattered form lacks.

Large monoliths are not accidents or failures of architectural ambition. Shopify, GitHub, and Basecamp have each made deliberate choices to keep their core systems in single deployable units at significant scale. The engineering required to make those choices work correctly is substantial, specific, and underrepresented in the literature relative to the volume of writing about service decomposition. Martin Fowler’s MonolithFirst pattern recommends starting with a monolith and extracting services only when specific pain points justify the cost. Practitioner accounts like this one are how that recommendation stays grounded in what the discipline of maintaining the monolith actually requires.