Optimizing Space Amplification in RocksDB
Siying Dong
1
, Mark Callaghan
1
, Leonidas Galanis
1
,
Dhruba Borthakur
1
, Tony Savor
1
and Michael Stumm
2
1
Facebook, 1 Hacker Way, Menlo Park, CA USA 94025
{siying.d, mcallaghan, lgalanis, dhruba, tsavor}@fb.com
2
Dept. Electrical and Computer Engineering, University of Toronto, Canada M8X 2A6
stumm@eecg.toronto.edu
ABSTRACT
RocksDB is an embedded, high-performance, persistent key-
value storage engine developed at Facebook. Much of our
current focus in developing and configuring RocksDB is to
give priority to resource efficiency instead of giving priority
to the more standard performance metrics, such as response
time latency and throughput, as long as the latter remain
acceptable. In particular, we optimize space efficiency while
ensuring read and write latencies meet service-level require-
ments for the intended workloads. This choice is motivated
by the fact that storage space is most often the primary
bottleneck when using Flash SSDs under typical production
workloads at Facebook. RocksDB uses log-structured merge
trees to obtain significant space efficiency and better write
throughput while achieving acceptable read performance.
This paper describes methods we used to reduce storage
usage in RocksDB. We discuss how we are able to trade
off storage efficiency and CPU overhead, as well as read
and write amplification. Based on experimental evaluations
of MySQL with RocksDB as the embedded storage engine
(using TPC-C and LinkBench benchmarks) and based on
measurements taken from production databases, we show
that RocksDB uses less than half the storage that InnoDB
uses, yet performs well and in many cases even better than
the B-tree-based InnoDB storage engine. To the best of our
knowledge, this is the first time a Log-structured merge tree-
based storage engine has shown competitive performance
when running OLTP workloads at large scale.
1. INTRODUCTION
Resource efficiency is the primary objective in our storage
systems strategy at Facebook. Performance must be suffi-
cient to meet the needs of Facebook’s services, but efficiency
should be as good as possible to allow for scale.
This article is published under a Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/), which permits distribution
and reproduction in any medium as well allowing derivative works, pro-
vided that you attribute the original work to the author(s) and CIDR 2017.
8th Biennial Conference on Innovative Data Systems Research (CIDR ‘17)
January 8-11, 2017 , Chaminade, California, USA.
Facebook has one of the largest MySQL installations in
the world, storing many 10s of petabytes of online data. The
underlying storage engine for Facebook’s MySQL instances
is increasingly being switched over from InnoDB to My-
Rocks, which in turn is based on Facebook’s RocksDB. The
switchover is primarily motivated by the fact that MyRocks
uses half the storage InnoDB needs, and has higher average
transaction throughput, yet has only marginally worse read
latencies.
RocksDB is an embedded, high-performance, persistent
key-value storage system [1] that was developed by Face-
book after forking the code from Google’s LevelDB [2, 3].
1
RocksDB was open-sourced in 2013 [5]. MyRocks is Rocks-
DB integrated as a MySQL storage engine. With MyRocks,
we can use RocksDB as backend storage and still benefit
from all the features of MySQL.
RocksDB is used in many applications beyond just My-
SQL, both within and outside of Facebook. Within Face-
book, RocksDB is used as a storage engine for Laser, a
high query throughput, low latency key-value storage ser-
vice [6], ZippyDB, a distributed key-value store with Paxos-
style replication [6], Dragon, a system to store indices of the
Social Graph [7], and Stylus, a stream processing engine [6],
to name a few. Outside of Facebook, both MongoDB [8]
and Sherpa, Yahoo’s largest distributed data store [9], use
RocksDB as one of their storage engines. Further, RocksDB
is used by LinkedIn for storing user activity [10] and by Net-
flix to cache application data [11], to list a few examples.
Our primary goal with RocksDB at Facebook is to make
the most efficient use of hardware resources while ensur-
ing all important service level requirements can be met, in-
cluding target transaction latencies. Our focus on efficiency
instead of performance is arguably unique in the database
community in that database systems are typically compared
using performance metrics such as transactions per minute
(e.g., tpmC) or response-time latencies. Our focus on effi-
ciency does not imply that we treat performance as unim-
portant, but rather that once our performance objectives are
achieved, we optimize for efficiency. Our approach is driven
in part by the data storage needs at Facebook (that may
well differ from that of other organizations):
1. SSDs are increasingly being used to store persistent
data and are the primary target for RocksDB;
2. Facebook relies primarily on shared nothing configura-
1
A Facebook blog post lists many of the key differences be-
tween RocksDB and LevelDB [4].