ABOUT THIS BOOKxviii
management becomes easily contained; runtime is standardized; and the approach is
developer-friendly enough that development and operations can use the same tools
and, byte for byte, the same container. Thus, “It works for me, but not in production”
is uttered far fewer times. CoreOS is the operationalization of this computing model in
a way that uses the advantages of containerization in a generic, distributed system
model.
Throughout this book, you’ll learn how to take advantage of this computing
model. You’ll learn how to deploy and manage CoreOS both in a prototype environ-
ment and in production in the cloud. You’ll also learn how to design and adapt your
application stacks to operate well in this context. In addition to the
OS, I’ll cover each
of CoreOS’s components in detail, along with their application: etcd for configuration
and discovery, rkt for a different approach to the container runtime, fleet for distrib-
uted service scheduling, and flannel for network abstraction.
Distributed computing is nothing new; many models and software packages for dis-
tributed systems have been around since the dawn of computing. But most of these
systems have been historically obscure, highly proprietary, or cloistered in particular
industries like scientific computing. Some of the oldest designs exist today only to sup-
port legacy systems from the 1970s that powered distributed computing for main-
frames and minicomputers.
History and motivations behind CoreOS
The concept of single system image (SSI) computing is an OS architecture that hasn’t
seen much activity since the 1990s, except for a few cases that have longstanding sup-
port to run legacy systems.
SSI is an architecture that presents many computers in a
cluster as a single system. There is a single filesystem, shared interprocess communica-
tion (
IPC) via shared runtime space, and process checkpointing/migration.
MOSIX/openMosix, Kerrighed, VMScluster, and Plan 9 (natively supported) are all
SSI systems. Plan 9 has probably received the most current development activity, which
should tell you something about the popularity of this computing model.
The main drawbacks of
SSI are, first, that the systems are often extremely difficult
to configure and maintain and aren’t geared toward generic use. Second, the field has
stagnated significantly: there’s nothing new in
SSI, and it has failed to catch on as a
popular model. I think this is because scientific and other Big Data computing have
embraced grid-compute, batch operating models like Condor,
BOINC, and Slurm.
These tools are designed to run compute jobs in a cluster and deliver a result;
SSI’s
shared
IPC provides little benefit for these applications, because the cost (in time) of
data transmission is eclipsed by the cost of the blocking batch process. In the world of
application server stacks, abstractions by protocols like
HTTP and distributed queues
have also made shared
IPC not worth investing in.
The problem space now for distributed computing is how to effectively manage
large-scale systems. Whether you’re working on a web stack or distributed batch pro-
cessing, you may not need shared
IPC, but the other things that came with SSI have
more apparent value: a shared filesystem means you configure only one system, and