优化RocksDB的空间放大：资源效率与性能平衡

需积分: 9 70 浏览量更新于2023-03-16 收藏 1.07MB PDF 举报

"本文主要探讨了在RocksDB中优化空间放大问题，重点关注如何在确保读写延迟满足服务级别要求的同时，提高存储效率，特别是针对Facebook典型生产工作负载下，存储空间经常成为瓶颈的情况。RocksDB利用日志结构合并树（Log-Structured Merge Tree, LSM-Tree）来实现显著的空间效率和更好的写入吞吐量。" 在RocksDB中，空间放大是一个关键的优化目标，因为这直接影响到存储资源的使用效率。在传统的性能指标如响应时间延迟和吞吐量达到可接受水平的前提下，RocksDB的开发和配置策略更倾向于优先考虑资源效率。特别是在Facebook的生产环境中，使用闪存固态硬盘（Flash SSDs）时，存储空间往往是性能瓶颈。 RocksDB是一个嵌入式、高性能、持久化的键值存储引擎，其核心特性之一是采用LSM-Tree数据结构。LSM-Tree的设计原理是将数据分批写入到内存中的日志缓冲区，然后定期将这些数据合并并写入到磁盘上的有序文件中。这种设计使得RocksDB在写入操作上具有高效率，因为它避免了对磁盘的随机写入，转而进行顺序写入，后者在固态硬盘上通常更快。然而，LSM-Tree在提供高效写入的同时，可能会导致空间放大问题，因为需要维护多个级别的数据副本以保持数据的有序性。为了优化空间放大，文章可能涉及以下策略： 1. **压缩**：RocksDB支持多种数据压缩算法，如Snappy、LZ4和ZSTD，通过压缩数据可以减少存储需求，降低空间放大。 2. ** Memtable管理**：调整内存中的Memtable大小和数量，以及何时将数据从Memtable刷新到磁盘，可以在保证读写延迟的同时，减少磁盘占用。 3. **Compaction策略**：优化Compaction过程，例如设置合适的Compaction阈值，避免过多的小文件产生，减少合并的开销。 4. **Block缓存**：有效利用Block缓存可以提高读取性能，同时减少对磁盘的访问，从而降低空间需求。 5. ** Bloom Filter**：使用Bloom Filter可以减少不必要的磁盘I/O，尤其是在查询不存在的键时，节省空间并提高效率。 6. **分级存储**：根据数据的访问频率和年龄，采用不同的存储级别，将热数据保留在高速缓存中，冷数据移至低速但容量大的存储介质，以平衡空间效率和访问速度。 7. ** Tombstone管理**：有效地处理删除操作产生的Tombstone，避免它们占用过多空间。通过上述方法，RocksDB能够在保证服务质量的同时，最大限度地降低空间放大，从而提高存储资源的利用率。这对于Facebook这样的大规模数据中心至关重要，因为存储成本是运营成本的主要组成部分之一。通过持续优化，RocksDB能够更好地适应各种工作负载，提供高效且经济的数据存储解决方案。

Optimizing Space Ampliﬁcation in RocksDB

Siying Dong

, Mark Callaghan

, Leonidas Galanis

Dhruba Borthakur

, Tony Savor

and Michael Stumm

Facebook, 1 Hacker Way, Menlo Park, CA USA 94025

{siying.d, mcallaghan, lgalanis, dhruba, tsavor}@fb.com

Dept. Electrical and Computer Engineering, University of Toronto, Canada M8X 2A6

stumm@eecg.toronto.edu

ABSTRACT

RocksDB is an embedded, high-performance, persistent key-

value storage engine developed at Facebook. Much of our

current focus in developing and conﬁguring RocksDB is to

give priority to resource eﬃciency instead of giving priority

to the more standard performance metrics, such as response

time latency and throughput, as long as the latter remain

acceptable. In particular, we optimize space eﬃciency while

ensuring read and write latencies meet service-level require-

ments for the intended workloads. This choice is motivated

by the fact that storage space is most often the primary

bottleneck when using Flash SSDs under typical production

workloads at Facebook. RocksDB uses log-structured merge

trees to obtain signiﬁcant space eﬃciency and better write

throughput while achieving acceptable read performance.

This paper describes methods we used to reduce storage

usage in RocksDB. We discuss how we are able to trade

oﬀ storage eﬃciency and CPU overhead, as well as read

and write ampliﬁcation. Based on experimental evaluations

of MySQL with RocksDB as the embedded storage engine

(using TPC-C and LinkBench benchmarks) and based on

measurements taken from production databases, we show

that RocksDB uses less than half the storage that InnoDB

uses, yet performs well and in many cases even better than

the B-tree-based InnoDB storage engine. To the best of our

knowledge, this is the ﬁrst time a Log-structured merge tree-

based storage engine has shown competitive performance

when running OLTP workloads at large scale.

1. INTRODUCTION

Resource eﬃciency is the primary objective in our storage

systems strategy at Facebook. Performance must be suﬃ-

cient to meet the needs of Facebook’s services, but eﬃciency

should be as good as possible to allow for scale.

This article is published under a Creative Commons Attribution License

(http://creativecommons.org/licenses/by/3.0/), which permits distribution

and reproduction in any medium as well allowing derivative works, pro-

vided that you attribute the original work to the author(s) and CIDR 2017.

8th Biennial Conference on Innovative Data Systems Research (CIDR ‘17)

January 8-11, 2017 , Chaminade, California, USA.

Facebook has one of the largest MySQL installations in

the world, storing many 10s of petabytes of online data. The

underlying storage engine for Facebook’s MySQL instances

is increasingly being switched over from InnoDB to My-

Rocks, which in turn is based on Facebook’s RocksDB. The

switchover is primarily motivated by the fact that MyRocks

uses half the storage InnoDB needs, and has higher average

transaction throughput, yet has only marginally worse read

latencies.

RocksDB is an embedded, high-performance, persistent

key-value storage system [1] that was developed by Face-

book after forking the code from Google’s LevelDB [2, 3].

RocksDB was open-sourced in 2013 [5]. MyRocks is Rocks-

DB integrated as a MySQL storage engine. With MyRocks,

we can use RocksDB as backend storage and still beneﬁt

from all the features of MySQL.

RocksDB is used in many applications beyond just My-

SQL, both within and outside of Facebook. Within Face-

book, RocksDB is used as a storage engine for Laser, a

high query throughput, low latency key-value storage ser-

vice [6], ZippyDB, a distributed key-value store with Paxos-

style replication [6], Dragon, a system to store indices of the

Social Graph [7], and Stylus, a stream processing engine [6],

to name a few. Outside of Facebook, both MongoDB [8]

and Sherpa, Yahoo’s largest distributed data store [9], use

RocksDB as one of their storage engines. Further, RocksDB

is used by LinkedIn for storing user activity [10] and by Net-

ﬂix to cache application data [11], to list a few examples.

Our primary goal with RocksDB at Facebook is to make

the most eﬃcient use of hardware resources while ensur-

ing all important service level requirements can be met, in-

cluding target transaction latencies. Our focus on eﬃciency

instead of performance is arguably unique in the database

community in that database systems are typically compared

using performance metrics such as transactions per minute

(e.g., tpmC) or response-time latencies. Our focus on eﬃ-

ciency does not imply that we treat performance as unim-

portant, but rather that once our performance objectives are

achieved, we optimize for eﬃciency. Our approach is driven

in part by the data storage needs at Facebook (that may

well diﬀer from that of other organizations):

1. SSDs are increasingly being used to store persistent

data and are the primary target for RocksDB;

2. Facebook relies primarily on shared nothing conﬁgura-

A Facebook blog post lists many of the key diﬀerences be-

tween RocksDB and LevelDB [4].

下载后可阅读完整内容，剩余8页未读，立即下载

u011327476

粉丝: 2
资源: 11

优化RocksDB的空间放大：资源效率与性能平衡

rocksdb install

卫星图像目标探测的空间数据处理优化_Optimizing Data Processing in Space for Object

Optimizing power oscillations in an ellipsometric system

Optimizing Opportunistic Routing in Asynchronous Wireless Sensor Networks

Optimizing parallel reduction in CUDA 规约优化文档

Delay Minimization by Optimizing Antenna Allocation in SIMO System

Optimizing Sensor Activation in a Language Domain for Fault Diagnosis

Optimizing software in C++

optimizing software in cpp

An Energy-Aware Algorithm for Optimizing Resource Allocation in Software Defined Network

最新资源