4
doesn’t falsify Q#G. If G
1
⇒ G
2
then Q#G
1
⇒ Q#G
2
. It’s natural to
define
Q~
F
= {q | q ∉ Z
F
}, and similarly for Q~
S
.
Quorum sets
Q and Q are (mutually) exclusive if we can’t have
both a
Q quorum for G and a Q quorum for its negation:
(∀G | Q#G ⇒ ~Q #~G). This holds if every Q quorum intersects
every
Q quorum in a set of processes that can’t all be faulty:
∀q ∈ Q, q ∈ Q | q ∩ q ∈ Q~
F
This is how we lift local exclusion G
1
⇒ ~G
2
to global exclusion
Q#G
1
⇒ ~Q #G
2
. Exclusion is what we need for safety.
For liveness we need to relate various quorums to the sets of
possibly faulty or stopped processes.
To ensure
G holds at some non-faulty process, we need to
hear it from a good quorum, one that can’t all be faulty, that
is, one in
Q~
F
. If g = G
m
is independent of m, then Q
~F
#G ⇒ g;
this is how we establish g by hearing from some processes.
To ensure that henceforth there’s a visible
Q quorum satisfy-
ing a predicate G, we need a quorum Q
+
satisfying G that still
leaves a
Q quorum after losing any set that can fail:
Q
+
= {q | (∀z ∈ Z
FS
| q – z ∈ Q}.
If Q
+
{} then Q is live: there’s always some quorum in Q
that isn’t failed.
The most popular quorum sets are based only on the size of the
quorums:
Q
i
= {q | |q|
i}. If there are n processes, then for Q
i
and
Q
M
to be exclusive, we need i + j > n + f. If Z
F
= Z
I
then
Q~
F
= Q
I
+1
. If Z
FS
= Z
V
then Q
L
+
= Q
s+i
and Q
i
live requires
i
n – s, since Q
>n
= {}. So we get n + f < i + j
2(n – s), or
n > f + 2s. Also i > n + f – j
n + f – (n – s), or i > f + s. With the
minimum n = f + 2s + 1, f + s < i
f + s + 1, so we must have i =
f + s + 1. If s = f, we get n = 3f + 1 and i = 2f + 1.
With
f = 0 there are exclusive ‘grid’ quorum sets: arrange the
processes in a rectangular grid and take Q to be the rows and Q
the columns. If Q must exclude itself, take a quorum to be a row
and a column, minus the intersection if both have more than two
processes. The advantage: a quorum is only
¥
n or 2(
¥
n – 1) proc-
esses, not n/2. This generalizes to f > 0 because quorums of i rows
and
j columns intersect in 2ij processes [15].
For the Intel-Microsoft example, an exclusive quorum must be
the union of an exclusive quorum on each of the two sides.
3 The specification for consensus
The external actions are Input, which provides an input value from
the client, and
Decision, which returns the decision, waiting until
there is one.
4
Consensus collects the inputs in the input set, and the
internal Decide action picks one from the set.
type X =… values to agree on
var d : (X ∪ {nil}) := nil Decision; x/nil, not out
input : set X := {}
Name Guard State change
Input(x) input := input ∪ {x}
Decision: X d
nil ret d
Decide d = nil ∧ x ∈ input
d := x
For replicated state machines, the inputs are requests from the
clients. Typically there is more than one at a time; those that don’t
win are carried over to
input for the next step.
4
A different spec would allow it to return nil if there’s no decision, but
then it must be able to return nil even if there has already been a decision,
since a client may do the Decision action at a process that hasn’t yet heard
about the decision. For this paper it makes no difference.
It’s interesting to observe that there is a simpler spec with iden-
tical behavior.
5
It has the same d and Decision, but drops input and
Decide, doing the work in Input.
var d : (X ∪ {nil}) := nil Decision; x/nil, not out
Input(x) if d = nil then optionally d := x
Decision: X d
nil ret d
A simulation proof that the first spec implements the second,
however, requires a prophecy variable or backward simulation.
This spec says nothing about liveness, because there is no live
algorithm for asynchronous consensus [4].
4 Abstract Paxos
As we said in section 1.2, the idea of Paxos is to have a sequence
of views until one of them forms a quorum that is noticed. So
each view has three stages:
Choose an input value that is anchored: guaranteed to be the
same as any previous decision.
Try to get a decision quorum of agents to accept the value.
If successful, finish the algorithm by recording the decision
at the agents.
This section describes AP, an abstract version of Paxos. AP
can’t run on your computers because some of the actions refer to
non-local state (marked like this so you can easily see where the
implementation must differ). In particular,
Choose and c
v
are com-
pletely non-local in AP. In later sections we will see different
ways to implement AP with actions that are entirely local. The
key problem is implementing
Choose.
AP has external actions with the same names as the spec, of
course. They are almost identical to the actions of the spec.
Name Guard State change
Input(x) input := input ∪ {x}
Decision
a
: X d
a
nil ret d
a
4.1 State variables
type V = ... View; totally ordered
Y = X ∪ {out, nil}
A ⊆ M = … Agent
Q = set A Quorum
const Q
dec
: set Q := ... decision Quorum set
Q
out
: set Q := ... out Quorum set
v
0
: V := ... smallest V
The views must be totally ordered, with a first view v
0
. Q
dec
and
Q
out
must be exclusive.
var r
v
a
: Y := nil, but r
v
0
a
:= out Result
d
a
: Y := nil Decision; x/nil, not out
c
v
: Y := nil Choice; x/nil, not out
input : set X := {}
active
v
: Bool := false
Each agent has a decision d
a
, and a result r
v
a
for each view; we
take
r
v
0
a
= out for every a. AP doesn’t say where the other vari-
ables live.
4.2 State functions and invariants
We define a state function r
v
that is a summary of the r
v
a
: the
view’s choice if there’s a decision quorum for that among the
agents, or
out if there’s an out quorum for that, or nil otherwise.
5
I am indebted to Michael Jackson for a remark that led to this idea.