Me-CLOCK: SSD大缓存中内存高效替换策略框架

192 浏览量更新于2024-08-29 收藏 656KB PDF 举报

Me-CLOCK是一个专注于内存效率的框架，专为大型缓存设计，特别是在固态驱动器（SSD）广泛用作硬盘存储系统缓存的背景下。随着SSD提供的超大容量，管理如此大规模的缓存带来了显著的内存开销问题，这不仅影响了成本效益，也降低了能源效率。传统上，缓存替换策略中的数据结构存储在主内存中，但随着SSD尺寸的膨胀，这种做法不再适用。 Me-CLOCK的核心创新在于将大部分数据结构移出主内存，存储到SSD中，只保留少量、内存高效的数据结构，如本工作中提出的一种新型Bloom过滤器。这种设计旨在减少内存占用，使基于最近最少使用（LRU）的替换策略能够在近乎忽略不计的内存开销下得以实现。通过理论分析和实验评估，该框架旨在优化SSD缓存的性能，提高其在海量数据存储场景下的效率和可持续性。在实施过程中，Me-CLOCK框架可能包括以下几个关键组件和技术： 1. **内存外数据结构**：大部分数据结构被移至SSD中，利用SSD的非易失性和高I/O速度，即使在断电情况下也能保持数据一致性。 2. **内存高效Bloom过滤器**：这是一种空间效率极高的数据结构，用于快速判断一个元素是否存在于集合中。由于SSD上的空间有限，这种数据结构可以显著减少内存消耗。 3. **缓存替换算法**：尽管移除了大部分数据结构，Me-CLOCK仍能支持各种LRU变种，如LFU（Least Frequently Used）、LFMR（Least Frequently Modified and Recently Used）等，通过内存高效Bloom过滤器辅助判断哪些数据应被淘汰。 4. **性能评估与优化**：通过理论分析和实际测试，研究人员会验证Me-CLOCK在各种工作负载下的性能，如读写延迟、命中率和内存使用情况，以便进行调整和优化。 5. **可扩展性**：考虑到未来SSD容量的可能增长，Me-CLOCK设计应具有良好的可扩展性，以适应不同规模的缓存需求。 Me-CLOCK为解决SSD大规模缓存的内存效率问题提供了一个创新的解决方案，它通过巧妙地分离内存占用和存储需求，实现了在内存效率和性能之间的良好平衡，有望提升现代存储系统的整体效能。

Me-CLOCK:A Memory-Efﬁcient Framework

to Implement Replacement Policies

for Large Caches

Zhiguang Chen, Nong Xiao, Member, IEEE,

Yutong Lu, Fang Liu, and Yang Ou

Abstract—Solid State Drives (SSDs) have been extensively deployed as the

cache of hard disk-based storage systems. The SSD-based cache generally

supplies ultra-large capacity, whereas managing so large a cache introduces

excessive memory overhead, which in turn makes the SSD-based cache neither

cost-effective nor energy-efﬁcient. This work targets to reduce the memory

overhead introduced by the replacement policy of SSD-based cache. Traditionally,

data structures involved in cache replacement policy reside in main memory. While

these in-memor y data structures are not suitable for SSD-based cache any more

since the cache is much larger than ever. We propose a memory-efﬁcient

framework which keeps most data structures in SSD while just leaving the memory-

efﬁcient data str ucture (i.e., a new bloom proposed in this work) in main memory.

Our framework can be used to implement any LRU-based replacement policies

under negligible memory overhead. We evaluate our proposals via theoretical

analysis and prototype implementation. Experimental results demonstrate that,

our framework is practical to implement most replacement policies for large caches,

and is able to reduce the memory overhead by about 10.

Index Terms—Cache, SSD-based cache, SSD, cache replacement policy, storage

1INTRODUCTION

FLASH memory is well-known for its low latency, high parallelism,

and energy efﬁciency. Solid State Drives (SSDs) based on ﬂash

memory inherit all these advantages, thus are widely accepted by

large-scale storage systems. However, the price of SSDs is quite

high compared with hard disks. Storage systems absolutely built

by SSDs are not cost-effective. Accordingly, a large number of stor-

age systems take SSDs as the cache of hard disks. One outstanding

feature of SSD-based caches is the large capacity compared with

DRAM-based caches. Managing so large a cache requires a large

volume of main memory, which makes the SSD-based cache nei-

ther cost-effective nor energy-efﬁcient. In this work, we target to

reduce the memory overhead introduced by the cache replacement

policy. Existing cache replacement policies such as ARC [1], LIRS

[2], 2Q [3], and SAC [4] are mostly based on Least Recently Used

(LRU) queue. The LRU queue is traditionally implemented via

doubly linked list. Each entry of the list is a tuple containing three

ﬁelds at least, i.e., hPage number; Left pointer; Right pointeri. The

Page number denotes the page associated with the entry. The left

and right pointers are used to link adjacent entries in the list. If

each ﬁeld is 4 bytes the entire tuple takes up 12 bytes. Such a low

space overhead (i.e., 12 bytes for each 4 KB page) is negligible for

DRAM-based caches. However, for the SSD-based caches that sup-

ply much larger capacity than ever, keeping such a low space over-

head in main memory is still unacceptable. Take a 1TBSSD-based

cache containing 256 M (1 TB/4 KB) pages for an example, the

LRU policy used to manage the cache consumes as many as 3 GB

(256 M  12 bytes) main memory. Such a large DRAM will increase

the cost as well as power of SSD-based cache signiﬁcantly.

In this work, we propose a memory-efﬁcient framework to

implement LRU policy. As LRU policy is the fundament of most

complex cache replacement policies, our framework can be further

applied to these polices, facilitating SSD-based caches to adopt

these policies without introducing excessive memory overhead.

The new framework is an improvement over CLOCK [5]. CLOCK

maintains two data structures, a First In First Out (FIFO) queue and

some reused ﬂags. Each entry of the FIFO queue corresponds to a

page of the cache, where the reused ﬂag appended to the entry is used

to indicate whether the corresponding page has been frequently

accessed or not. Frequently accessed pages are protected from being

evicted out of cache. CLOCK imitates the LRU policy exactly, thus

has been extensively adopted to implement LRU-based cache

replacement policies. However, as data structures maintained by

CLOCK must reside in main memory, CLOCK is not suitable for

SSD-based caches due to the unacceptable memory overhead.

Our framework improves CLOCK by keeping the FIFO queue

in SSD, rather than in main memory. The memory overhead of

CLOCK is mostly introduced by the FIFO queue. Keeping FIFO

queue in SSD is an intuitive optimization to reduce the memory

overhead. However, CLOCK cannot keep FIFO queue in SSD since

that each entry of the FIFO queue is appended with a randomly

accessed reused ﬂag. As a result, the whole FIFO queue is randomly

accessed, thus must be kept in main memory. We propose a new

bloom ﬁlter [6] to replace the reused ﬂags, enabling the FIFO queue

to be kept in SSD. Speciﬁcally, our framework consists of two com-

ponents, a FIFO queue and a bloom ﬁlter. As the entries of FIFO

queue are not appended with randomly accessed reused ﬂags any

more, operations to the FIFO queue merely occur at the head or

tail. So, we only maintain the head and tail of FIFO queue in main

memory, while keep most entries of FIFO queue in SSD. The bloom

ﬁlter takes over the responsibility of reused ﬂags, indicating whether

a page has been frequently accessed. It is kept in main memory but

introduces negligible memory overhead.

Our memory-efﬁcient CLOCK (Me-CLOCK) requires the bloom

ﬁlter to support element deletion as well as remain memory-

efﬁcient. However, there is no existing bloom ﬁlter meeting these

demands. Accordingly, we propose such a new bloom ﬁlter, which

is another contribution of this work. We evaluate our proposals via

theoretical analysis and prototype implementation. Evaluation

results demonstrate that Me-CLOCK can be used to implement

complex LRU-based cache replacement policies; the memory over-

head is 10 times less than that introduced by traditional manners

such as the LRU queue and CLOCK. Note that Me-CLOCK is just

used to implement replacement policies, but cannot be used to

index pages in the cache. The cache generally adopts separate data

structures such as Red-Black tree to index its contents.

The rest of this paper is organized as follows. Section 2 presents

the background and motivations. Section 3 elaborates the principle

of Me-CLOCK. Section 4 proposes a bloom ﬁlter supporting dele-

tion. Section 5 and 6 evaluate the new bloom ﬁlter and Me-CLOCK,

respectively. Section 7 summarizes some related works. The last

section concludes this work.

2BACKGROUND AND MOTIVATIONS

This section argues that, the existing LRU and CLOCK cannot be

used to implement the replacement policies for large caches due to

their high memory overhead.

2.1 The LRU Queue

The LRU queue is the fundament of most replacement policies

such as ARC [1], 2Q [3]. Operations to the LRU queue are

explained as follows.

ADD. when a page is accessed for the ﬁrst time, the LRU policy

generates a new entry for it, and then adds the entry to the Most

Recently Used (MRU) end of LRU queue.

HIT. when a page hits in cache, the related entry is moved to the

MRU end of the LRU queue.

 The authors are with the State Key Laboratory of High Performance Computing,

National University of Defense Technology, Changsha, China.

E-mail: {che nzhiguang, nongxiao, ytlu, liufang, yangou}@nudt.edu.cn.

Manuscript received 4 June 2014; revised 8 Sept. 2015; accepted 11 Oct. 2015. Date of

publication 25 Oct. 2015; date of current version 15 July 2016.

Recommended for acceptance by J. D. Bruguera.

For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.

org, and reference the Digital Object Identiﬁer below.

Digital Object Identiﬁer no. 10.1109/TC.2015.2495182

IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 8, AUGUST 2016 2665

0018-9340 ß 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38675341

粉丝: 8
资源: 998

Me-CLOCK: SSD大缓存中内存高效替换策略框架

IEC TS 61508-3-1：2016 软件要求-重复使用预先存在的软件元素来实现全部或部分安全功能 - 完整英文版（12页）

unigui0.83.5.820

how-to-implement-cascading-lookups-e5000:DevExtreme，DevExtreme（HTML JS），UI窗口小部件

leetcodepushfront-LeetCode_232--Implement-Queue-using-Stacks:LeetCode_2

How-to-implement-CRUD-operations-via-API-controllers-in-an-ASP.NET-Core-with-Razor-Pages-project:ASP.NET Core，Razor页面，DataGrid，CRUD

Dao-zero: implement your DAO intefaces-开源

CDIO-Framework:动物分类

hybrid-a-star-annotation:Hybrid A*路径规划器的代码注释

lt-implement:第一个LT代码实施

paper-implement-in-pytorch:뷰하고리논문을이토치로구현하고다

最新资源