What Breaking Paxos Reveals About Distributed Consensus

When Murat Demirbas sets out to break a consensus algorithm, the exercise is never purely destructive. The point is to find the boundary of the guarantee, to understand precisely what the algorithm promises and, more usefully, what it does not. Paxos is the protocol that distributed systems researchers invoke to prove other things correct, which makes breaking it an unusually clarifying activity.

The short version of what Paxos guarantees: safety, always; liveness, never. Paxos Made Simple, Lamport’s 2001 re-exposition of his original algorithm, is explicit about this. The protocol ensures that once a value is chosen, no different value can ever be chosen. It does not ensure that any value is ever chosen at all. The asymmetry follows directly from the FLP impossibility result, which proved in 1985 that no deterministic algorithm can guarantee consensus in an asynchronous system where even a single process can fail.

Understanding what “breaking” means in this context requires distinguishing between two different kinds of failure: violating the safety invariant (two different values get chosen) and violating liveness (no value ever gets chosen). The two are not symmetric. Safety violations are catastrophic and must never occur. Liveness failures are expected under adversarial conditions and are managed, not prevented.

Constructing the Dueling Proposers Scenario

The canonical liveness attack on Basic Paxos requires only two proposers and an adversary that controls message delivery timing. Two proposers, A and B, are both trying to commit a value to a set of acceptors.

A sends Prepare(ballot=1) → quorum promises, no prior values seen
B sends Prepare(ballot=2) → quorum overrides A's ballot
A sends Accept(ballot=1, value=vA) → rejected (quorum promised ballot ≥ 2)
A increments ballot: sends Prepare(ballot=3) → quorum overrides B
B sends Accept(ballot=2, value=vB) → rejected (quorum promised ballot ≥ 3)
B increments ballot: sends Prepare(ballot=4) → quorum overrides A
... indefinitely

This scenario is fully within the specification of Basic Paxos. No safety invariant is violated. Each prepare phase correctly invalidates previous accepts. The system simply never makes progress. The adversary does not need to drop messages or cause process failures; controlling message delivery timing is sufficient.

The standard response is Multi-Paxos: elect a distinguished leader, allow only one proposer at a time, and skip Phase 1 for subsequent proposals once the leader is established. This works in practice, but it moves the problem rather than solving it. Multi-Paxos is not a complete algorithm. It is Basic Paxos with the hard part left unspecified: how do you elect and maintain a leader, and what invariants hold during transitions?

The Underspecification Is a Design Choice

Paxos’s tendency to leave critical mechanisms unspecified is a deliberate property of the protocol’s architecture. Lamport intended it as a proof of possibility, not an implementation guide. Every production implementation of Paxos is a combination of Paxos and some additional protocol, usually for leader election, lease management, and cluster reconfiguration. Google’s Chubby uses Paxos with a lease-based leadership mechanism. Apache ZooKeeper uses ZAB (ZooKeeper Atomic Broadcast), which shares Paxos’s structure but extends it with a separate recovery protocol. Each of these extensions is where the real complexity lives.

The safety properties of these systems depend entirely on the correctness of the pieces that Paxos leaves unspecified. Leader leases illustrate the risk precisely. The optimization works like this: a leader acquires a lease lasting duration d. While the lease holds, the leader can serve reads locally without running Phase 1, because no other node can have committed new values without the current leader’s participation. This is a significant performance win, eliminating a round trip for every read.

Clock drift breaks it. If the leader’s clock runs slow relative to other nodes, the leader may serve reads after a new leader has been elected and has already started committing writes. The reads return stale data that is inconsistent with the current committed state. This is a safety violation of exactly the kind the base protocol was designed to prevent, introduced by an optimization layered on top. The canonical response, which Google’s Spanner uses, is TrueTime: a hardware-backed API that bounds clock uncertainty explicitly, allowing lease durations to account for worst-case skew rather than assuming synchronized clocks.

Reconfiguration and the Quorum Invariant

Adding or removing nodes from a running Paxos cluster is the protocol’s other major underspecified problem. The safety invariant for Paxos requires that any two quorums overlap. In a 3-node cluster, a quorum is 2 nodes, and any two quorums share at least one node. This shared node is the mechanism that prevents two proposers from independently reaching a majority and choosing different values.

Naive reconfiguration destroys this invariant. Consider adding two nodes to a 3-node cluster. During the transition, old nodes may be computing quorums as 2-of-3 while new nodes compute quorums as 3-of-5. Both definitions are simultaneously active. You can now have two separate majorities, one from each quorum definition, that do not intersect. Two proposers reaching their respective majorities can choose different values, and the protocol’s core guarantee fails silently.

Raft’s single-server changes (Ongaro, 2014) solve this conservatively: only one node may be added or removed at a time, so old and new quorums always overlap by construction. The more general solution, joint consensus, runs two overlapping quorum configurations simultaneously during the transition and requires agreement from both before committing the membership change. Both approaches are explicit acknowledgments that the transition window is dangerous and requires its own careful specification, not just Paxos applied with common sense.

Flexible Paxos (Howard, Malkhi, Spiegelman 2016) generalizes the quorum argument in a different direction, proving that the quorum requirements for Phase 1 and Phase 2 do not need to be identical. You can use a Phase 1 quorum of n and a Phase 2 quorum of 1, or any combination where the two quorum sizes sum to more than the total number of nodes. This allows interesting optimizations in geo-distributed deployments where you want to minimize cross-datacenter coordination in the common case, at the cost of a heavier leader election.

What Formal Verification Surfaces

Demirbas’s approach to consensus analysis is informed by years of working with TLA+ and model checking, including his earlier collaboration producing Paxos Made Moderately Complex, one of the more complete formal specifications of Multi-Paxos available. Model checking explores all possible interleavings of a protocol’s execution, including adversarial orderings that almost never occur in practice but are possible in principle.

Many of the reconfiguration bugs and lease-based safety failures documented in the literature were found through model checking rather than production incidents, though some were found the harder way. The Ivy verification framework and TLC, the TLA+ model checker, have been used to formally verify Paxos variants and, in doing so, have exposed subtly incorrect invariants in published algorithms. Algorithms cited in thousands of papers have had their specifications found incorrect under exhaustive formal analysis. That outcome is uncomfortable for the field; it is also the reliable output of the break-the-protocol methodology.

The value of the exercise is not to conclude that Paxos is flawed. It is that constructing adversarial scenarios forces you to enumerate the assumptions your system actually relies on. Leader election is an assumption. Clock synchronization within a bound is an assumption. Single-server reconfiguration is a constraint that enforces a stronger quorum-overlap assumption. When any of these fail, the question is not whether Paxos is correct but whether you correctly implemented all the pieces Paxos leaves unspecified.

Production distributed systems are largely a record of teams discovering, sometimes through outages, which implicit assumptions their implementation had made. The break-the-protocol lens is how you find those assumptions before the outage does.