PREFACE
xvii
If you’re reading this, you’re presumably interested in knowing how I got involved
with HBase. Let me start by saying thank you for choosing this book as your means to
learn about HBase and how to build applications that use HBase as their underlying
storage system. I hope you’ll find the text useful and learn some neat tricks that will
help you build better applications and enable you to succeed.
I was pursuing graduate studies in computer science at
UC Santa Cruz, specializing
in distributed systems, when I started working at Cisco as a part-time researcher. The
team I was working with was trying to build a data-integration framework that could
integrate, index, and allow exploration of data residing in hundreds of heterogeneous
data stores, including but not limited to large
RDBMS systems. We started looking for
systems and solutions that would help us solve the problems at hand. We evaluated
many different systems, from object databases to graph databases, and we considered
building a custom distributed data-storage layer backed by Berkeley
DB. It was clear
that one of the key requirements was scalability, and we didn’t want to build a full-
fledged distributed system. If you’re in a situation where you think you need to build
out a custom distributed database or file system, think again—try to see if an existing
solution can solve part of your problem.
Following that principle, we decided that building out a new system wasn’t the best
approach and to use an existing technology instead. That was when I started playing
with the Hadoop ecosystem, getting my hands dirty with the different components in
the stack and going on to build a proof-of-concept for the data-integration system on
top of
HBase. It actually worked and scaled well! HBase was well-suited to the problem,
but these were young projects at the time—and one of the things that ensured our
success was the community.
HBase has one of the most welcoming and vibrant open
source communities; it was much smaller at the time, but the key principles were the
same then as now.
The data-integration project later became my master’s thesis. The project used
HBase at its core, and I became more involved with the community as I built it out. I
asked questions, and, with time, answered questions others asked, on both the mailing
lists and the
IRC channel. This is when I met Nick and got to know what he was work-
ing on. With each day that I worked on this project, my interest and love for the tech-
nology and the open source community grew, and I wanted to stay involved.
After finishing grad school, I joined Amazon in Seattle to work on back-end distrib-
uted systems projects. Much of my time was spent with the Elastic MapReduce team,
building the first versions of their hosted
HBase offering. Nick also lived in Seattle,
and we met often and talked about the projects we were working on. Toward the end
of 2010, the idea of writing HBase in Action for Manning came up. We initially scoffed
at the thought of writing a book on
HBase, and I remember saying to Nick, “It’s gets,
puts, and scans—there’s not a lot more to HBase from the client side. Do you want to
write a book about three API calls?”
But the more we thought about this, the more we realized that building applications
with
HBase was challenging and there wasn’t enough material to help people get off the