Not long after, Lars, the author of the book you are now reading, showed up on the
#hbase IRC channel. He had a big-data problem of his own, and was game to try HBase.
After some back and forth, Lars became one of the first users to run HBase in production
outside of the Powerset home base. Through many ups and downs, Lars stuck around.
I distinctly remember a directory listing Lars made for me a while back on his produc-
tion cluster at WorldLingo, where he was employed as CTO, sysadmin, and grunt. The
listing showed ten or so HBase releases from Hadoop 0.15.1 (November 2007) on up
through HBase 0.20, each of which he’d run on his 40-node cluster at one time or
another during production.
Of all those who have contributed to HBase over the years, it is poetic justice that Lars
is the one to write this book. Lars was always dogging HBase contributors that the
documentation needed to be better if we hoped to gain broader adoption. Everyone
agreed, nodded their heads in ascent, amen’d, and went back to coding. So Lars started
writing critical how-tos and architectural descriptions inbetween jobs and his intra-
European travels as unofficial HBase European ambassador. His Lineland blogs on
HBase gave the best description, outside of the source, of how HBase worked, and at
a few critical junctures, carried the community across awkward transitions (e.g., an
important blog explained the labyrinthian HBase build during the brief period we
thought an Ivy-based build to be a “good idea”). His luscious diagrams were poached
by one and all wherever an HBase presentation was given.
HBase has seen some interesting times, including a period of sponsorship by Microsoft,
of all things. Powerset was acquired in July 2008, and after a couple of months during
which Powerset employees were disallowed from contributing while Microsoft’s legal
department vetted the HBase codebase to see if it impinged on SQLServer patents, we
were allowed to resume contributing (I was a Microsoft employee working near full
time on an Apache open source project). The times ahead look promising, too, whether
it’s the variety of contortions HBase is being put through at Facebook—as the under-
pinnings for their massive Facebook mail app or fielding millions of of hits a second on
their analytics clusters—or more deploys along the lines of Yahoo!’s 1k node HBase
cluster used to host their snapshot of Microsoft’s Bing crawl. Other developments in-
clude HBase running on filesystems other than Apache HDFS, such as MapR.
But plain to me though is that none of these developments would have been possible
were it not for the hard work put in by our awesome HBase community driven by a
core of HBase committers. Some members of the core have only been around a year or
so—Todd Lipcon, Gary Helmling, and Nicolas Spiegelberg—and we would be lost
without them, but a good portion have been there from close to project inception and
have shaped HBase into the (scalable) general datastore that it is today. These include
Jonathan Gray, who gambled his startup streamy.com on HBase; Andrew Purtell, who
built an HBase team at Trend Micro long before such a thing was fashionable; Ryan
Rawson, who got StumbleUpon—which became the main sponsor after HBase moved
on from Powerset/Microsoft—on board, and who had the sense to hire John-Daniel
Cryans, now a power contributor but just a bushy-tailed student at the time. And then
xvi | Foreword