How to Construct a Paxos Failure, and What It Teaches You

Murat Demirbas’s “Break Paxos” post frames the consensus algorithm as something to be attacked rather than explained. That framing is pedagogically useful in a way that a straight walkthrough is not. The goal of breaking a protocol forces you to specify exactly what breaking it would mean, and that question reveals which guarantees are actually unconditional.

Paxos provides two kinds of guarantees. Safety is unconditional: once a value is chosen, no other value can ever be chosen, regardless of failures, message delays, or concurrent proposals. Liveness is conditional: given a stable leader and a network that eventually delivers messages, the protocol will reach a decision. These two properties have different statuses. You cannot construct an execution that breaks safety without first violating an assumption the correctness proof explicitly depends on. You can break liveness by doing nothing more exotic than running two proposers simultaneously.

The gap between those two statuses is the most important thing to understand before building systems on top of Paxos.

Breaking Liveness

The dueling proposers scenario requires no special conditions beyond what the protocol itself permits: any node may initiate a proposal at any time.

Given three acceptors and two proposers, A and B:

A sends Prepare(1) to all three acceptors. All three promise ballot 1 and report no prior accepted values. A is ready for Phase 2.
B sends Prepare(2) before A sends Accept(1). All three acceptors have now promised ballot 2; they will reject A’s upcoming Accept(1).
A detects the failure and escalates: Prepare(3). All three acceptors promise ballot 3, invalidating B’s pending Phase 2.
B escalates to Prepare(4), invalidating A again.
The sequence continues indefinitely.

No value is ever chosen, and neither proposer violates a protocol rule at any step; this is a structural property of allowing multiple concurrent proposers, not a flaw in either implementation. The algorithm is working exactly as specified.

The standard response is to elect a single distinguished proposer. Paxos says nothing about how to do this. Leader election is its own protocol, and the specifics of when authority transfers and how long it lasts interact with the Multi-Paxos log recovery process in ways that have caused correctness incidents in production. The liveness fix has a surface area that connects directly back to the safety properties.

Breaking Safety Through Implementation

Basic Paxos safety rests on one assumption the algorithm cannot enforce for itself: acceptors must write their state to durable storage before responding to any message. The three fields are promised_n (the highest ballot promised), accepted_n (the ballot of the last accepted value), and accepted_v (that value).

The failure scenario runs like this. Acceptor X receives Prepare(5) and sends its Promise(5) reply before completing an fsync. X crashes. On restart, X has no record of its promise at ballot 5. When a new Prepare(3) arrives, X responds with no prior history, treating ballot 3 as competing on equal terms with the already-in-flight ballot 5 round.

If the ballot 5 proposer has not yet achieved a majority, a new proposer at ballot 3 can collect a different set of promises, potentially from nodes that never saw ballot 5, and commit a different value to the same slot. The safety guarantee that once a value is chosen no other value can be chosen holds within the formal model. The formal model assumes persistence. Remove that assumption from the implementation and the guarantee disappears from the deployed system, regardless of what the documentation says.

This failure mode has caused production incidents. The acceptor persistence requirement is stated in Lamport’s papers; it is easy to rationalize away in implementations that prioritize write latency, treat the failure window as negligible, or use storage layers that do not provide crash-consistent writes without explicit fsyncing.

The Leader Transition Window in Multi-Paxos

Multi-Paxos introduces a second channel for correctness problems: the log recovery phase during leader transition.

When a leader fails while processing a batch of log slots, a new leader must determine the state of every in-progress slot before accepting new writes. It does this by running Phase 1 on each uncertain slot, collecting responses from a majority, and determining whether any value was already committed under the previous leader. The correctness argument for this recovery is sound. The implementation surface is not small: the new leader must identify which slots are uncertain, handle the case where the old leader was partitioned rather than crashed and may still be running Phase 2 on old slots, and reconcile the Phase 1 sweep against concurrent client requests arriving at the new leader.

This is not a correctness hole in Paxos; it is a large, failure-sensitive implementation requirement that Lamport’s papers leave almost entirely unspecified. “Paxos Made Live”, the 2007 Google paper that documented the engineering required to ship Chubby, spends the majority of its pages on exactly this transition window, along with leader leases, epoch numbers to identify stale messages, and disk corruption handling, none of which appears in “Paxos Made Simple.”

What TLA+ Confirms

Lamport’s TLA+ specification of Paxos encodes the protocol as a state machine and the safety invariant as a formula the model checker verifies over all reachable states. For small configurations, typically three acceptors and two proposers, the checker exhaustively confirms that no two distinct values are ever chosen. The safety invariant holds in every reachable state.

Liveness requires a separate temporal property:

Liveness == <>(\E v \in Values : Chosen(v))

The model checker will not confirm this without adding a fairness assumption to the environment, typically weak fairness on all enabled actions. Without that assumption, it generates the dueling proposers trace as a counterexample. The formal model agrees precisely with the adversarial intuition: safety is exhaustively verified for the model’s assumptions; liveness requires an external promise about the environment that the protocol cannot make for itself.

The persistence assumption enters the TLA+ model as an initial condition on acceptor state, not as a step in the protocol logic. This is structurally honest about where the burden lies. An implementation that reads the specification and asks what would happen if that initial condition were violated gets a clear answer: the safety invariant is no longer meaningful over the execution, because the model requires it to be true before the first message is sent.

Outperforming Paxos

Egalitarian Paxos (EPaxos), from Moraru, Ongaro, and Aguilera’s 2013 SOSP paper, allows any replica to commit non-conflicting commands in a single round trip without a designated leader. In geo-distributed deployments, where leader-based protocols require a cross-region round trip for every write, this difference is measurable. EPaxos achieves near-optimal throughput and latency by exploiting the fact that Paxos does not prescribe a single leader, only that quorum membership constraints are respected.

This is the direction in which you can meaningfully improve on Paxos: not by providing a safer algorithm, since the safety guarantees are correct within their model, but by using the flexibility that Paxos’s intentional underspecification preserves. Raft removes that flexibility deliberately, trading it for a complete specification that leaves fewer implementation-defined decisions. EPaxos preserves it, with corresponding implementation complexity and limited production adoption as a result.

Neither direction is about finding a flaw in the algorithm. They are about navigating the constraints Paxos imposes and the constraints it does not.

The Underlying Lesson

Breaking a protocol is a useful exercise because it tells you which assumptions are doing the work. For Paxos, the load-bearing assumptions are: majority quorums, durable acceptor state, and a single proposer for liveness. The first two underpin safety. The third underpins liveness.

Every documented correctness failure in a production Paxos deployment traces back to one of these assumptions being violated by an implementation decision made outside the formal model. The break conditions are not defects in the algorithm; they are gaps between what the algorithm specifies and what the implementation commits to maintaining. Reading the safety proof confirms the guarantee. Trying to construct a breaking execution forces you to find the assumptions the proof depends on. For an implementer, the second exercise is the more useful one.