RAMCloud: 高性能内存存储系统

需积分: 10 145 浏览量更新于2024-07-22 收藏 728KB PDF 举报

RAMCloud是一种高性能的键值存储系统，它专为提供对大规模数据集的低延迟访问而设计。其核心思想是将所有数据始终存储在动态随机存取内存(DRAM)中，以确保极致的读写速度。为了支持如此庞大的存储容量（达到1PB或以上），RAMCloud通过连接数千台服务器的内存，构建了一个统一的、分布式的关键值存储系统。为了保证数据的持久性，尽管DRAM是非易失性的，但RAMCloud会在次要存储（如硬盘）上保持备份副本，以防主存故障。它采用了一种统一的日志结构管理机制，这种机制使得数据操作既高效又稳定，能够同时处理DRAM中的活跃数据和次级存储的备份。通信方面，RAMCloud采取了轮询式的通信方式，避免了通过操作系统内核进行通信，而是直接利用网络接口卡(NICs)进行高速数据传输。这样，客户端应用程序能够在不到5微秒的时间内从任何RAMCloud存储服务器读取小对象，而持久化的写入操作大约需要15微秒。与传统存储系统不同，RAMCloud并不在线保留多份数据副本，而是依赖于实时恢复机制来保证高可用性。这意味着在出现故障时，系统能够迅速从备份中恢复数据，从而维持服务的连续性。 RAMCloud的设计强调了性能和效率的平衡，特别是在处理大量小对象请求时，其优势尤为明显。这种存储系统的架构和优化策略使其成为处理实时数据处理、数据库应用以及需要低延迟响应的场景的理想选择。然而，由于其对硬件资源的依赖和可能的单点故障风险，RAMCloud可能不适用于对数据冗余有特别高要求或者对成本敏感的场景。RAMCloud代表了现代IT领域中一种新颖且高效的数据存储解决方案。

1:10 J. Ousterhout et al.

hash(tableId, key)

...

123

B3 B24 B19 B45 B7 B11 B12 B3 B28

Log Segments

Segment Replicas

on Backups

124

125

Head Segment

Hash

Table

Buckets

Master's DRAM

...

Fig. 4. Each master organizes its main memory as a log, which is divided into 8 MB segments. Each segment

is replicated on the secondary storage of several backups (for example, segment 124 is replicated on backups

45, 7, and 11). The master maintains a hash table to locate live objects quickly. To look up an object, a master

selects a hash table bucket using a hash of the object’s table identiﬁer and key. A bucket occupies one cache

line (64 bytes) and contains 8 entries, each holding a pointer to an object in the log and 16 bits of the object’s

key hash. For each bucket entry that matches the desired key hash, the full key must be compared against

the key stored in the log entry. Small objects can typically be retrieved with two last-level cache misses: one

for the hash table bucket and one for the object in the log. If a hash bucket ﬁlls, its last entry is used as a

pointer to an overﬂow bucket.

— Crash recovery: if a master crashes, its log can be replayed to reconstruct the in-

formation that was in the master’s DRAM.

— Efﬁcient memory utilization: the log serves as the storage allocator for most of

a master’s DRAM, and it does this more efﬁciently than a traditional malloc-style

allocator or garbage collector.

— Consistency: the log provides a simple way of serializing operations. We have made

only limited use of this feature so far, but expect it to become more important as we

implement higher-level features such as multi-object transactions.

We will discuss these properties in more detail over the rest of the paper.

4.1. Log basics

The log for each master is divided into 8 MB segments as shown in Figure 4; log seg-

ments occupy almost all of the master’s memory. New information is appended to the

head segment; segments other than the head are immutable. Figure 5 summarizes the

types of entries that are stored in the log.

In addition to the log, the only other major data structure on a master is a hash

table, which contains one entry for each live object stored on the master. During read

requests, the hash table allows the master to determine quickly whether there exists

an object corresponding to a particular table identiﬁer and key and, if so, ﬁnd its entry

in the log (see Figure 4).

Each log segment is replicated in secondary storage on a conﬁgurable number of

backups (typically three). The master chooses a different set of backups at random for

each segment; over time, its replicas tend to spread across all of the backups in the

cluster. Segment replicas are never read during normal operation; they are only read

if the master that wrote them crashes, at which time they are read in their entirety as

described in Section 7. RAMCloud never makes random accesses to individual objects

on secondary storage.

ACM Transactions on Computer Systems, Vol. ??, No. ??, Article 1, Publication date: March ??.

The RAMCloud Storage System 1:11

Object Describes a single object, including table identiﬁer, key, value, version num-

ber, and coarse-grain timestamp for last modiﬁcation (for cleaning). §4.2

Tombstone Indicates that an object has been deleted or overwritten. Contains the ta-

ble identiﬁer, key, and version number of the deleted object, as well as the

identiﬁer of the segment containing the object. §4.4

Segment header This is the ﬁrst entry in each segment; it contains an identiﬁer for the log’s

master and the identiﬁer of this segment within the master’s log. §4.5

Log digest Contains the identifers of all the segments that were part of the log when

this entry was written. §4.3, §4.5, §7.4

Safe version Contains a version number larger than the version of any object ever

managed by this master; ensures monotonicity of version numbers across

deletes when a master’s tablets are transferred to other masters during

crash recovery.

Tablet statistics Compressed representation of the number of log entries and total log bytes

consumed by each tablet stored on this master. §7.4

Fig. 5. The different types of entries stored in the RAMCloud log. Each entry also contains a checksum used

to detect corruption. Log digests, safe versions, and table statistics are present only in segments containing

newly written data, and they follow immediately after the segment header; they are not present in other

segments, such as those generated by the cleaner or during recovery. The section numbers indicate where

each entry type is discussed.

The segment size was chosen to make disk I/O efﬁcient: with an 8 MB segment

size, disk latency accounts for only about 10% of the time to read or write a full seg-

ment. Flash memory could support smaller segments efﬁciently, but RAMCloud re-

quires each object to be stored in a single segment, so the segment size must be at

least as large as the largest possible object (1 MB).

4.2. Durable writes

When a master receives a write request from a client, it appends a new entry for the

object to its head log segment, creates a hash table entry for the object (or updates an

existing entry), and then replicates the log entry synchronously in parallel to the back-

ups storing the head segment. During replication, each backup appends the entry to a

replica of the head segment buffered in its memory and responds to the master with-

out waiting for I/O to secondary storage. When the master has received replies from

all the backups, it responds to the client. The backups write the buffered segments

to secondary storage asynchronously. The buffer space is freed once the segment has

been closed (meaning a new head segment has been chosen and this segment is now

immutable) and the buffer contents have been written to secondary storage.

This approach has two attractive properties: writes complete without waiting for

I/O to secondary storage, and backups use secondary storage bandwidth efﬁciently by

performing I/O in large blocks, even if objects are small.

However, the buffers create potential durability problems. RAMCloud promises

clients that objects are durable at the time a write returns. In order to honor this

promise, the data buffered in backups’ main memories must survive power failures;

otherwise a datacenter power failure could destory all copies of a newly written object.

RAMCloud currently assumes that servers can continue operating for a short period

after an impending power failure is detected, so that buffered data can be ﬂushed to

secondary storage. The amount of data buffered on each backup is small (not more

than a few tens of megabytes), so only a few hundred millseconds are needed to write

it safely to secondary storage. An alternative approach is for backups to store buffered

ACM Transactions on Computer Systems, Vol. ??, No. ??, Article 1, Publication date: March ??.

剩余53页未读，继续阅读

cc_wx

粉丝: 0
资源: 3

RAMCloud: 高性能内存存储系统

浅议内存云（RAMCloud）的未来发展

大棚蔬菜种植基地建设项目可行性研究报告.doc

RPG Maker MZ 插件教程

PuLP-1.6.9-cp35-cp35m-win_amd64.whl

【高创新】基于人工蜂鸟优化算法AHA-CNN-LSTM-Attention的用客流量预测算法研究Matlab实现.rar

SSM-多人命题系统.zip

一个只完成前端部分而没有整个后端数据库的虚假留言板。基于HTML+CSS+JS实现，实现了登录跳转和页面消息功能.zip

ssm医院远程诊断系统.zip

快速了解kubernetes应用的包管理Helm工具的helm-v3.14.2-linux-amd64.tar.gz

OpenImageIO-2.1.18.1-cp38-cp38-win_amd64.whl

最新资源