What It Takes to Break Paxos

Murat Demirbas published a post this month framed around the idea of breaking Paxos. It is worth engaging with seriously, because the framing exposes something important: the word “break” means several different things depending on which version of Paxos you are talking about, and conflating them produces a distorted picture of where consensus protocols are actually fragile.

What the Proof Covers

Lamport’s “Paxos Made Simple” (2001) is a deliberately spare document. It proves safety for a single-decree consensus protocol over two phases. Phase 1, Prepare/Promise: a proposer selects a ballot number n, broadcasts Prepare(n) to a quorum of acceptors, and each acceptor that has not already promised to a higher ballot replies with a promise not to accept anything below n, along with the highest-balloted value it has previously accepted. Phase 2, Accept/Accepted: the proposer, having collected a quorum of promises, sends Accept(n, v) where v is constrained to be the value carried by the highest-balloted promise received, or the proposer’s own value if no acceptor had accepted anything previously. A quorum of acceptors responding with Accepted constitutes a chosen value.

The invariant that holds everything together is quorum intersection. Any two majorities share at least one member, so a new proposer conducting Phase 1 will always hear from at least one acceptor that knows about any previously chosen value. This is the mechanism that prevents two different values from being chosen in the same or different ballots. The proof is correct, tight, and applies to a single instance of the protocol deciding a single value.

“The Part-Time Parliament” (1989, published 1998) covers the same ground in more generality, embedded in an allegory about a Greek parliament. The proof there is also correct. The 1998 publication delay is itself a historical curiosity: Lamport submitted the paper in 1989, reviewers found the presentation difficult, and the paper sat in a drawer for nearly a decade. He eventually published it alongside “Paxos Made Simple” as a more accessible companion. The difficulty reviewers experienced was not with the proof but with the presentation, which points at a recurring theme: correctness of the core protocol and clarity of the specification are separate properties.

Where Underspecification Lives

Single-decree Paxos is not what anyone builds. Real systems need to agree on a sequence of values, which means running many instances of the protocol, managing a log of committed entries, and handling the operational concerns that arise when nodes crash, recover, and rejoin. Multi-Paxos is the name given to this generalization, and it is where the story changes sharply.

The 2001 paper describes Multi-Paxos in roughly two paragraphs. It gestures at the optimization of skipping Phase 1 once a stable leader exists, notes that the leader can be viewed as holding a long-running lease covering all future slots, and then stops. Leader election, recovery of uncommitted slots, log gap filling, snapshot and compaction, membership changes: none of these are specified. The paper acknowledges this and treats it as outside scope.

Google’s “Paxos Made Live” (2007) documents what filling that scope actually required. The paper describes Chubby, Google’s distributed lock service built on Paxos, and catalogs the engineering work that was not covered by any published description of the algorithm. Disk corruption. Snapshot and restore under concurrent log writes. Membership change semantics. The paper’s tone is useful: it does not present these as failures of Paxos but as the expected cost of specifying a complete system from an incomplete algorithmic description. The gap between the proof and the implementation is not a crack in the foundation; it is the foundation being left for the implementor to pour.

This is where “breaking” Paxos becomes meaningful. The protocol cannot be broken in the sense that its safety proof can be invalidated; the proof is correct. It can be broken in the sense that a real implementation of Multi-Paxos, filling in the underspecified parts incorrectly, will produce a system that violates the invariants the protocol was supposed to guarantee.

Formal Verification and the Bugs It Found

TLA+ is the specification language most commonly used to model consensus protocols. It lets you describe a system’s state transitions and check properties over all reachable states, up to a bound. AWS began publishing accounts of using TLA+ to find bugs in their distributed systems around 2014, and the results were instructive. Protocol-level properties that engineers believed to hold, based on informal reasoning, turned out to have subtle violations under specific interleavings. The bugs were not in the core Paxos protocol; they were in the operational layers built on top of it, the parts “Paxos Made Simple” does not specify.

TLA+ model checking operates most naturally in the space of safety properties: it can verify that a bad state is never reached, for all executions up to a given depth. Liveness properties, which assert that something good eventually happens, are harder to check and require separate machinery. The distinction matters because Paxos provides safety by construction and liveness only conditionally. Safety says nothing bad happens. Liveness says something good eventually happens. Under an unstable leader or an adversarial network, neither of which violates the protocol’s stated assumptions, the system can stall indefinitely while remaining entirely safe.

The classic liveness failure mode is dueling proposers. Two proposers each discover the other’s ballot mid-flight and issue a new prepare with a higher number, preempting the other’s accept phase before it can complete. This cycle can continue indefinitely. The protocol is doing exactly what it should, but no value ever gets chosen. The fix, a stable leader with randomized backoff, is not part of the protocol, and implementing it correctly in a production system is not trivial.

Flexible Paxos and the Quorum Assumption

Flexible Paxos, introduced by Howard, Malkhi, and Spiegelman, identifies that the quorum intersection requirement is more general than the majority-quorum framing implies. The requirement is that Phase 1 quorums and Phase 2 quorums must intersect, but the quorums for each phase do not need to be the same size.

A system can use a large Phase 1 quorum and a small Phase 2 quorum, or vice versa, depending on the performance trade-offs it wants to make. In a five-node cluster, Phase 1 could require four nodes while Phase 2 requires only two. As long as any Phase 1 quorum intersects any Phase 2 quorum, the safety invariant holds. This opens up configurations that are useful in geo-distributed deployments, where the cost of a full majority quorum across regions is prohibitive, but small Phase 2 quorums touching local replicas are cheap. Flexible Paxos does not break the protocol; it demonstrates that the quorum structure most textbooks present as fundamental is a conservative instantiation of a more general constraint, and that constraint can be satisfied in more ways than the standard majority rule.

Jepsen and the Space Between Spec and System

Kyle Kingsbury’s Jepsen testing framework applies adversarial network conditions, partition scenarios, and clock manipulation to running distributed systems, then checks the histories of operations against consistency models. The results have repeatedly found systems that claim Paxos-based or Raft-based consensus producing histories that violate the consistency guarantees those algorithms are supposed to provide.

The violations are not proofs that Paxos is wrong. They document the gap between the algorithm and the implementation. A system that uses Raft for log replication but handles clock synchronization for lease-based reads in a subtly incorrect way will produce linearizability violations under Jepsen that have nothing to do with Raft’s correctness. The same pattern applies to Multi-Paxos implementations: bugs in snapshot restore, in leader lease expiration, or in the handling of stale reads can produce consistency violations in a system that otherwise runs correct Paxos at the protocol layer.

Raft was designed to be more understandable than Multi-Paxos, and the user-study data from the original 2014 paper supports that claim in a pedagogical setting. Understandability does not guarantee implementation correctness, though, and Jepsen has found bugs in Raft-based systems alongside Paxos-based ones. The algorithm being clearly specified does not mean that the implementation satisfies the specification in all operating conditions. It means the specification is a better starting point for verifying that the implementation does.

What Breaking Actually Requires

To be precise about where the fragility lives: the safety proof of Basic Paxos has not been broken and the conditions for breaking it are not available to an adversary operating within the protocol’s stated network model. The liveness of any Paxos-based system under an unstable leader can be broken, deliberately or accidentally, by the protocol’s own design. The correctness of Multi-Paxos implementations is routinely undermined by filling in underspecified parts incorrectly, as Jepsen documents and as “Paxos Made Live” acknowledges candidly.

Demirbas’s post, as a provocation, is useful because it forces these three distinct failure modes into view. The distributed systems literature has spent twenty-five years on the gap between single-decree proofs and production implementations. TLA+ has helped close some of that gap at the specification level. Jepsen has helped close it at the testing level. Flexible Paxos has helped clarify which assumptions are structural and which are conventions. None of these developments invalidate the protocol. They are the ongoing work of understanding what Paxos’s correctness guarantees actually cover, and what they have always required you to build yourself.