Paxos has a reputation for difficulty that sits oddly against what the algorithm actually says. Engineers who work in distributed systems treat it as famously subtle, something that rewards years of study. The algorithm Lamport described, though, is a two-phase protocol that fits comfortably on a single page. This explanation from mydistributed.systems makes a fair case that Basic Paxos, stated in plain English, is not complicated.
That clarity is genuine, but it stops at a precise boundary. Understanding where that boundary falls matters as much as the clarity itself.
The Two-Phase Protocol
Basic Paxos reaches agreement on a single value among a set of nodes. The protocol assigns three roles: proposers that initiate decisions, acceptors that vote on proposals, and learners that observe the final result. The core logic lives in the acceptors.
Phase 1 begins when a proposer selects a proposal number n and sends Prepare(n) to a majority of acceptors. Each acceptor that receives this message makes a promise: it will not accept any proposal numbered less than n, and it reports back the highest-numbered proposal it has already accepted. This response is called a Promise.
Phase 2 uses those promises. The proposer sends Accept(n, v) to a majority of acceptors. The value v is constrained: if any acceptor reported an already-accepted value in Phase 1, the proposer must use the value from the highest-numbered such report. Only when no acceptor has accepted anything can the proposer choose v freely. An acceptor that receives Accept(n, v) accepts it unless it has since promised a higher proposal number.
When a majority of acceptors have accepted the same proposal, the value is chosen. Any proposer running Phase 1 afterward will hear about that value from at least one acceptor in the responding majority, because any two majorities over the same set of nodes share at least one member. This overlap is what makes the safety invariant hold: a chosen value cannot be overwritten.
A minimal sketch of the mechanism:
# Proposer
def propose(value):
n = next_proposal_number()
promises = send_prepare_to_majority(n)
if len(promises) < majority_size:
return FAILURE
# Use the value from the highest-numbered already-accepted proposal
highest = max(promises, key=lambda p: p.accepted_n, default=None)
v = highest.accepted_v if highest else value
accepted = send_accept_to_majority(n, v)
return CHOSEN if len(accepted) >= majority_size else FAILURE
# Acceptor (all three fields must survive crashes)
def on_prepare(n):
if n > promised_n:
promised_n = n
return Promise(n, accepted_n, accepted_v)
return NACK
def on_accept(n, v):
if n >= promised_n:
promised_n, accepted_n, accepted_v = n, n, v
return Accepted(n, v)
return NACK
The acceptor’s three state fields must be written to durable storage before sending responses. Everything else can be reconstructed from that state. The proof of correctness follows from the majority overlap: any two quorums share a member, so any proposer that completes Phase 1 after a value is chosen will learn about it.
How This Algorithm Got Its Reputation
Lamport wrote the first version of Paxos in 1989 as “The Part-Time Parliament,” framing the protocol as the voting procedure of a mythical Greek legislature. He submitted it to ACM Transactions on Computer Systems. The reviewers found the framing unconventional and requested a rewrite in standard notation. Lamport declined, and the paper sat unpublished for nine years before TOCS finally ran it in 1998.
In 2001, he wrote “Paxos Made Simple”, a fourteen-page informal description without the Greek framing. The paper ends with a brief sketch of how to extend Basic Paxos to agree on a sequence of values, which is what state machine replication requires. The sketch notes that a stable leader can skip Phase 1 for subsequent commands after winning it once, that pipelining is possible, and that various optimizations are available. None of the engineering details, the failure handling, the log gap recovery, the leader election protocol, the state transfer, are specified.
That gap is where most of the difficulty people associate with Paxos lives.
What the Algorithm Leaves Open
Basic Paxos decides one value. A replicated state machine needs to decide an ordered sequence of commands, and the extensions required to do that are significant.
Leader election. Allowing any node to propose causes livelock: Proposer A prepares with n=1, Proposer B prepares with n=2 before A can finish, then A prepares with n=3, and the cycle continues without either proposer making progress. The standard fix is to elect a distinguished leader and suppress other proposers, but leader election is its own protocol with its own correctness requirements.
Log gaps. A leader that fails mid-operation can leave log slots in an uncertain state. Acceptors may have accepted a value for slot 5 while no learner confirmed it chosen. The new leader running Phase 1 for slot 5 will discover the pending value and must determine whether to commit or abandon it, while simultaneously processing new client requests for subsequent slots. Handling this correctly without losing entries or violating consistency is where implementation complexity concentrates.
Reconfiguration. Adding or removing nodes means changing what constitutes a majority. Doing this while decisions are in progress requires careful sequencing to avoid split-brain scenarios. Lamport addressed this in a separate later paper, “Reconfiguring a State Machine” (2010), which runs to over thirty pages.
Log compaction. Logs cannot grow without bound. Replacing old entries with a state snapshot requires coordinated truncation across replicas and a mechanism for lagging nodes to recover from a snapshot rather than from individual log entries.
Every production system built on Paxos has solved these problems. Basic Paxos addresses none of them directly.
Raft as a Response
Ongaro and Ousterhout published “In Search of an Understandable Consensus Algorithm” in 2014, and the paper is direct about its motivation: “Paxos Made Simple” leaving Multi-Paxos underspecified contributed to the difficulty people experienced implementing consensus in practice. Their stated goal was an algorithm that could be understood completely, not just explained at a high level.
Raft decomposes the problem into leader election, log replication, and safety, and specifies all three fully. User studies in the paper showed that students given equal study time performed significantly better on Raft comprehension questions than on Paxos ones. The algorithm includes explicit answers to reconfiguration, log compaction, and client interaction.
Raft is not simpler in mechanism count. It makes opinionated choices, including a strong leader requirement and sequential log commitment, that Basic Paxos does not impose. Those choices are what enable a complete specification. Systems like Google’s Spanner use Paxos variants precisely because the flexibility that Raft trades away for clarity matters to them; Spanner’s need for fine-grained control over leader placement and lease management fits awkwardly with Raft’s assumptions.
The current deployment landscape reflects this split. etcd, CockroachDB, and TiKV use Raft. Google Spanner and Chubby use Paxos. Apache Cassandra uses Basic Paxos for its lightweight transactions, the conditional write operations (INSERT IF NOT EXISTS, UPDATE IF column = value) that require exactly the single-value agreement guarantee. Each lightweight transaction runs a full four-round-trip Paxos protocol, which is why the documentation recommends using them sparingly. This is Basic Paxos doing precisely what it promises; the performance cost comes with the territory.
Where the Simplicity Claim Stands
The source article’s claim is correct. Basic Paxos, stated plainly, is not complex. The safety invariant is clear, the proof of correctness is short, and an implementation requires very little code.
Understanding Basic Paxos gives you the core insight behind distributed agreement: the majority overlap guarantee is what allows nodes to reach consensus without centralized coordination. It does not give you a blueprint for a replicated log, a reconfiguration protocol, or a snapshot transfer mechanism.
Lamport’s framing in “Paxos Made Simple” captured the situation accurately: the algorithm is not complex, but the system you build with it must address problems the algorithm does not solve. The difficulty people attribute to Paxos attached to the wrong layer. Basic Paxos is clean and well-understood. The production engineering that goes on top of it, all the parts that “Paxos Made Simple” sketches and leaves unspecified, is what deserves the reputation. Reading a clear explanation of the two-phase protocol is a good place to start. Treating that clarity as the whole story is where the trouble begins.