Figure 2: API for fault-tolerant log.
between replicas. Replicas have access to persistent storage that survives crashes. Some replica s may submit
values for consens us. If eventually a majority of the replicas run for long e nough without crashing and there
are no failures, all running replicas are guaranteed to agree on one of the values that was submitted. In our
system, the value to be agreed upo n is the next entry in a (replicated) log as described in the introduction.
The algorithm consists of three phases, which may be repeated (because of failures ):
1. Elect a replica to be the coordinator.
2. The coordinator selects a value a nd broadcasts it to all replicas in a message called the accept message.
Other replicas either acknowledge this message or reject it.
3. Once a majority of the replicas acknowledge the coordinator, consensus has been rea ched, and the
coordinator broadcasts a commit message to notify replicas.
To provide some intuition about how the algorithm works, consider first the case in which there is only
a single coordinator and no failures. Consensus is reached once a majority of replicas receive the accept
message from the coordinator and acknowledge it. Subsequently, if any minority of the replicas fail, we are
still guaranteed that at least one replica will be alive that rece ived the consensus value.
In reality the coordinator may fail. Paxos does not require that only one replica act as coordinator at a
time. Multiple replica s may decide to become coordinators and e xecute the algorithm at any time. Typically
the system is engineered to limit coordinato r turnover, as it can delay reaching consensus .
This flexible election policy means there may be multiple replicas who simultaneously believe they are
the coor dinator. Further, these coordinato rs may select different values. Paxos ensures consensus can be
reached o n a single va lue (it can be from any coo rdinator) by introducing two extra mechanisms: 1) as signing
an ordering to the successive coordinators; and 2) restricting each coordinator’s choice in selecting a value.
Ordering the coordinators allows each replica to distinguish between the current coordinator and previous
coordinators. In this way, replicas can re ject messages from old coor dinators and prevent them from disrupt-
ing consensus once it is reached. Paxos orders the coordinators by ass igning them an increasing sequence
number as follows. Each replica keeps track of the most recent sequence number it has seen so far. When a
replica wants to become coordinator, it gener ates a unique
1
sequence number higher than any it has seen,
1
For example, in a system with n replicas, assign each replica r a unique id i
r
between 0 and n − 1. Replica r picks the
smallest sequence number s larger than any it has seen such that s mod n = i
r
.
4