CHAPTER 1. INTRODUCTION 2
dominated the discussion of consensus algorithms over the last two decades: most implementations
of consensus were based on Paxos or influenced by it, and Paxos had become the primary vehicle
used to teach students about consensus.
Unfortunately, Paxos is quite difficult to understand, in spite of numerous attempts to make
it more approachable. Furthermore, its architecture requires complex changes to support practical
systems, and building a complete system based on Paxos requires developing several extensions for
which the details have not been published or agreed upon. As a result, both system builders and
students struggle with Paxos.
The two other well-known consensus algorithms are Viewstamped Replication [83, 82, 66] and
Zab [42], the algorithm used in ZooKeeper. Although we believe both of these algorithms are in-
cidentally better in structure that Paxos for building systems, neither has explicitly made this argu-
ment; they were not designed with simplicity or understandability as a primary goal. The burden of
understanding and implementing these algorithms is still too high.
Each of these consensus options was difficult to understand and difficult to implement. Unfor-
tunately, when the cost of implementing consensus with proven algorithms was too high, systems
builders were left with a tough decision. They could avoid consensus altogether, sacrificing the fault
tolerance or consistency of their systems, or they could develop their own ad hoc algorithm, often
leading to unsafe behavior. Moreover, when the cost of explaining and understanding consensus
was too high, not all instructors attempted to teach it, and not all students succeeded in learning it.
Consensus is as fundamental as two-phase commit; ideally, as many students should learn it (even
though consensus is fundamentally more difficult).
After struggling with Paxos ourselves, we set out to find a new consensus algorithm that could
provide a better foundation for system building and education. Our approach was unusual in that our
primary goal was understandability: could we define a consensus algorithm for practical systems
and describe it in a way that is significantly easier to learn than Paxos? Furthermore, we wanted
the algorithm to facilitate the development of intuitions that are essential for system builders. It was
important not just for the algorithm to work, but for it to be obvious why it works.
This algorithm also had to be complete enough to address all aspects of building a practical
system, and it had to perform well enough for practical deployments. The core algorithm not only
had to specify the effects of receiving a message but also describe what should happen and when;
these are equally important for systems builders. Similarly, it had to guarantee consistency, and it
also had to provide availability whenever possible. It also had to address the many aspects of a
system that go beyond reaching consensus, such as changing the members of the consensus group.