The ZooKeeper Mission
Trying to explain what ZooKeeper does for us is like trying to explain what a screwdriver
can do for us. In very basic terms, a screwdriver allows us to turn or drive screws, but
putting it this way does not really express the power of the tool. It enables us to assemble
pieces of furniture and electronic devices, and in some cases hang pictures on the wall.
By giving some examples like this, we can give a sense of what can be done, but it is
certainly not exhaustive.
The argument for what a system like ZooKeeper can do for us is along the same lines:
it enables coordination tasks for distributed systems. A coordination task is a task in‐
volving multiple processes. Such a task can be for the purposes of cooperation or to
regulate contention. Cooperation means that processes need to do something together,
and processes take action to enable other processes to make progress. For example, in
typical master-worker architectures, the worker informs the master that it is available
to do work. The master consequently assigns tasks to the worker. Contention is different:
it refers to situations in which two processes cannot make progress concurrently, so one
must wait for the other. Using the same master-worker example, we really want to have
a single master, but multiple processes may try to become the master. The multiple
processes consequently need to implement mutual exclusion. We can actually think of
the task of acquiring mastership as the one of acquiring a lock: the process that acquires
the mastership lock exercises the role of master.
If you have any experience with multithreaded programs, you will recognize that there
are a lot of similar problems. In fact, having a number of processes running in the same
computer or across computers is conceptually not different at all. Synchronization
primitives that are useful in the context of multiple threads are also useful in the context
of distributed systems. One important difference, however, stems from the fact that
different computers do not share anything other than the network in a typical shared-
nothing architecture. While there are a number of message-passing algorithms to im‐
plement synchronization primitives, it is typically much easier to rely upon a component
that provides a shared store with some special ordering properties, like ZooKeeper does.
Coordination does not always take the form of synchronization primitives like leader
election or locks. Configuration metadata is often used as a way for a process to convey
what others should be doing. For example, in a master-worker system, workers need to
know the tasks that have been assigned to them, and this information must be available
even if the master crashes.
Let’s look at some examples where ZooKeeper has been useful to get a better sense of
where it is applicable:
4 | Chapter 1: Introduction