Apache Hadoop Goes Realtime at Facebook
Dhruba Borthakur
Kannan Muthukkaruppan
Karthik Ranganathan
Samuel Rash
Joydeep Sen Sarma
Nicolas Spiegelberg
Dmytro Molkov
Rodrigo Schmidt
Facebook
{dhruba,jssarma,jgray,kannan,
nicolas,hairong,kranganathan,dms,
aravind.menon,rash,rodrigo,
amitanand.s}@fb.com
Jonathan Gray
Hairong Kuang
Aravind Menon
Amitanand Aiyer
ABSTRACT
Facebook recently deployed Facebook Messages, its first ever
user-facing application built on the Apache Hadoop platform.
Apache HBase is a database-like layer built on Hadoop designed
to support billions of messages per day. This paper describes the
reasons why Facebook chose Hadoop and HBase over other
systems such as Apache Cassandra and Voldemort and discusses
the application’s requirements for consistency, availability,
partition tolerance, data model and scalability. We explore the
enhancements made to Hadoop to make it a more effective
realtime system, the tradeoffs we made while configuring the
system, and how this solution has significant advantages over the
sharded MySQL database scheme used in other applications at
Facebook and many other web-scale companies. We discuss the
motivations behind our design choices, the challenges that we
face in day-to-day operations, and future capabilities and
improvements still under development. We offer these
observations on the deployment as a model for other companies
who are contemplating a Hadoop-based solution over traditional
sharded RDBMS deployments.
Categories and Subject Descriptors
H.m [Information Systems]: Miscellaneous.
General Terms
Management, Measurement, Performance, Distributed Systems,
Design, Reliability, Languages.
Keywords
Data, scalability, resource sharing, distributed file system,
Hadoop, Hive, HBase, Facebook, Scribe, log aggregation,
distributed systems.
1. INTRODUCTION
Apache Hadoop [1] is a top-level Apache project that includes
open source implementations of a distributed file system [2] and
MapReduce that were inspired by Google’s GFS [5] and
MapReduce [6] projects. The Hadoop ecosystem also includes
projects like Apache HBase [4] which is inspired by Google’s
BigTable, Apache Hive [3], a data warehouse built on top of
Hadoop, and Apache ZooKeeper [8], a coordination service for
distributed systems.
At Facebook, Hadoop has traditionally been used in conjunction
with Hive for storage and analysis of large data sets. Most of this
analysis occurs in offline batch jobs and the emphasis has been on
maximizing throughput and efficiency. These workloads typically
read and write large amounts of data from disk sequentially. As
such, there has been less emphasis on making Hadoop performant
for random access workloads by providing low latency access to
HDFS. Instead, we have used a combination of large clusters of
MySQL databases and caching tiers built using memcached[9]. In
many cases, results from Hadoop are uploaded into MySQL or
memcached for consumption by the web tier.
Recently, a new generation of applications has arisen at Facebook
that require very high write throughput and cheap and elastic
storage, while simultaneously requiring low latency and disk
efficient sequential and random read performance. MySQL
storage engines are proven and have very good random read
performance, but typically suffer from low random write
throughput. It is difficult to scale up our MySQL clusters rapidly
while maintaining good load balancing and high uptime.
Administration of MySQL clusters requires a relatively high
management overhead and they typically use more expensive
hardware. Given our high confidence in the reliability and
scalability of HDFS, we began to explore Hadoop and HBase for
such applications.
The first set of applications requires realtime concurrent, but
sequential, read access to a very large stream of realtime data
being stored in HDFS. An example system generating and storing
such data is Scribe [10], an open source distributed log
aggregation service created by and used extensively at Facebook.
Previously, data generated by Scribe was stored in expensive and
hard to manage NFS servers. Two main applications that fall into
this category are Realtime Analytics [11] and MySQL backups.
We have enhanced HDFS to become a high performance low
latency file system and have been able to reduce our use of
expensive file servers.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
SIGMOD ’11, June 12–16, 2011, Athens, Greece.
Copyright 2011 ACM 978-1-4503-0661-4/11/06...$10.00.