which it is changing (as opposed to compute-intensive, where CPU cycles are the
bottleneck).
The tools and technologies that help data-intensive applications store and process
data have been rapidly adapting to these changes. New types of database systems
(“NoSQL”) have been getting lots of attention, but message queues, caches, search
indexes, frameworks for batch and stream processing, and related technologies are
very important too. Many applications use some combination of these.
The buzzwords that fill this space are a sign of enthusiasm for the new possibilities,
which is a great thing. However, as software engineers and architects, we also need to
have a technically accurate and precise understanding of the various technologies and
their trade-offs if we want to build good applications. For that understanding, we
have to dig deeper than buzzwords.
Fortunately, behind the rapid changes in technology, there are enduring principles
that remain true, no matter which version of a particular tool you are using. If you
understand those principles, you’re in a position to see where each tool fits in, how to
make good use of it, and how to avoid its pitfalls. That’s where this book comes in.
The goal of this book is to help you navigate the diverse and fast-changing landscape
of technologies for processing and storing data. This book is not a tutorial for one
particular tool, nor is it a textbook full of dry theory. Instead, we will look at examples
of successful data systems: technologies that form the foundation of many popular
applications, and that have to meet scalability, performance and reliability require‐
ments in production every day.
We will dig into the internals of those systems, tease apart their key algorithms, dis‐
cuss their principles and the trade-offs they have to make. On this journey, we will try
to find useful ways of thinking about data systems — not just how they work, but also
why they work that way, and what questions we need to ask.
After reading this book, you will be in a great position to decide which kind of tech‐
nology is appropriate for which purpose, and understand how tools can be combined
to form the foundation of a good application architecture. You won’t be ready to
build your own database storage engine from scratch, but fortunately that is rarely
necessary. You will, however, develop a good intuition for what your systems are
doing under the hood, so that you can reason about their behavior, make good design
decisions, and track down any problems that may arise.
Who Should Read this Book?
If you develop applications that have some kind of server/backend for storing or pro‐
cessing data, and your applications use the internet (e.g. web applications, mobile
apps, or internet-connected sensors), then this book is for you.
xiv | About this Book