环形缓存：面向海量存储的动态扩展与自适应管理

109 浏览量更新于2024-08-28 收藏 3MB PDF 举报

本文档聚焦于" CircularCache：适用于海量存储系统的可扩展和自适应缓存管理"这一关键主题，针对当前数据爆炸性增长背景下的存储系统优化问题。传统的硬盘(HDD)存储系统为了提升性能，往往会采用低延迟、高IOPS的固态硬盘(SSD)作为高速缓存，以减少对底层HDD的访问。然而，随着数据量的急剧增加，大规模的SSD缓存常常需要进行分区管理，以实现整体数据在多节点之间的均衡分布。作者们提出了一种创新的解决方案——CircularCache，这是一种设计巧妙的自适应和可扩展的SSD缓存机制。CircularCache的核心理念是根据数据热度动态分配SSD空间，即热度较高的虚拟磁盘应获得更多的SSD缓存空间。这种策略有助于提高缓存命中率，降低热点数据的访问延迟。为了实现这一目标，文章引入了一种名为VBQ（VDI-Based Queues）的动态替换算法。VBQ通过实时监控和调整多个队列的上界和下界，依据数据的访问次数和频率来决定数据在缓存中的位置。这种方法旨在优化资源利用率，确保热点数据始终处于内存中，同时防止冷数据占用过多空间。此外，CircularCache的优势在于其灵活性和可扩展性，能够随着存储系统的增长而平滑地扩展缓存容量，无需频繁调整配置。这对于面对大数据量和复杂工作负载的现代存储环境至关重要，有助于保持系统的稳定性和响应速度。这篇研究论文深入探讨了如何利用自适应和可扩展的缓存策略，解决海量存储系统中数据管理和性能优化的问题，对于理解现代数据中心的存储架构以及优化存储性能具有重要意义。

CircularCache: Scalable and Adaptive Cache

Management for Massive Storage Systems

Liqiong Liu, Xiaoyang Qu, Yubiao Zhang, Xiaodong Yi, Siwang Zeng, Jiguang Wan*, Changsheng Xie

National Laboratory for Optoelectronics

Huazhong University of Science and Technology

Wuhan, Hubei, P.R. China.

Email:{liulqiong, quxiaoy}@gmail.com, {jgwan,cs

xie}@mail.hust.edu.cn

Abstract—In order to enhance the performance of HDD-

based storage systems, low-latency and high-IOPS SSDs are

usually deployed as a cache above HDDs. With explosive data

growth, a large-scale SSD-based cache tend to adopt partition

management for overall cached data distribution across multiple

cache nodes. We proposed an adaptive and scalable SSD-based

cache called CircularCache, which distributes hot data across

multiple cache nodes. The hotter virtual disks deserve more

allocated free space in the SSD-cache. This paper exploited a

dynamic replacement algorithm called VBQ(VDI-Based Queues)

to manage the SSD-cache. The VBQ scheme manages the SSD-

cache by dynamically manipulating the upper-bounds and lower-

bounds of multiple queues based on the total access number

of virtual disks. To mitigate negative impacts of destaging on

overall storage performance, the dirty data in the cache will

be written back to data nodes during idle time. At the same

time, we utilize the redundant storage space in the data nodes

as logging area to retain reliability of the dirty data on the SSD-

cache. The prototype of CircularCache is implemented based on

Sheepdog. Experimental results show that CircularCache offers

a performance improvement by up to 270% compared with the

standard distributed storage system without an SSD-based cache.

I. INTRODUCTION

The Infrastructure-as-a-Service(IaaS) can offer virtualiza-

tion platform interfaces. With IaaS, clients can share available

hardware(physical) resources: storage resources, computing

resources, and network infrastructures. The infrastructure re-

sources are owned and managed by providers and can be

purchased by customers on-demand. Several service corpo-

rations have provided this service, such as Amazon Web

Services(AWS) and Microsoft Azure. With development of

IaaS, block-level distributed storage systems become more

and more important. For distributed storage systems, the

block-level interface can be adapted to heterogeneous client

applications.

For block-level distributed storage systems, virtual disks

[22] are proposed as interfaces to access storage resources.

For storage resource virtualization, virtual disks can make the

client’s view of storage transparent to physical storage re-

source. Consumers can purchase physical resource on demand,

which make the resource allocation more ﬂexible.

Solid State Drives(SSDs) have attracted large amount of

attention due to its high performance and low energy consump-

* Corresponding author

tion. Because of high price and low capacity of SSDs, it is

popular to utilize high performance of SSDs and high capacity

of HDDs to architect hybrid storage systems. SSD-based

arrays have been proposed to construct high-performance

storage system. However, SSDs are too expensive compared to

HDDs so far, they cannot replace the HDDs in a few years. In

addition, the amount of cold data accounts for a large fraction

of total data. This may reduce cost-performance of the overall

storage system. Thereby, SSDs are widely used for hybrid

storage systems where the SSDs cache hot data. In this paper,

we use several SSD nodes as a global cache above HDD nodes.

With the explosive growth in internet use, the amount of new

data is generated exponentially over time. With the explosive

data growth, conventional cache data partitions fail to adapt

to rapid data growth. In other words, the cache size should

be scalable and adaptive. For a large-scale cache consists of

multiple nodes, it is challenging to efﬁciently distribute the

data across multiple cache nodes. In addition, the associated

replacement policy should consider not only the recency and

frequency but also the characteristics of access patterns in

virtual environments.

With sharply increasing amount of data, the global SSD-

cache should be scalable and adaptive. We exploit a scalable

SSD-based cache called CircularCache, which supports hot

data distributions across multiple cache nodes. For Circular-

Cache, it is challenging to distribute hot data evenly and to

balance the workload, thereby this paper employs consistent

hashing as the cache partition policy. In addition, this paper

exploits a dynamic replacement algorithm called VBQ(VDI-

Based Queues) to manage the SSD-cache based on the hotness

of VDIs(Virtual Disk Image). The hotter virtual disks deserve

more allocated space in the SSD-cache. We also propose a

novel replacement policy to manage SSD-caches by dynam-

ically manipulating the upper bounds and lower bounds. To

provide services for thousands of clients, there are thousands

of virtual disks. The policy adjusts the available storage space

of every queue dynamically based on the real-time access

number of virtual disks. To reduce the impact of destaging

on overall performance, the dirty data in the cache will be

written back to data nodes during idle time. At the same time,

we take advantage of the redundant storage in the data nodes

as logging areas to retain the original data reliability of the

system.

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38698403

粉丝: 8
资源: 920

环形缓存：面向海量存储的动态扩展与自适应管理

ExoPlayer:适用于Android的可扩展媒体播放器

SnapKitExtensionDemo:SnapKit 关于数组的扩展—— 自适应宽度、垂直、水平、九宫格布局

adaptivecache:ASP.NET Core的自适应缓存

无线视频通信中基于队列预测的自适应缓存管理算法

adaptive-cache:贝叶斯自适应缓存用于昂贵的模型

arc:包弧实现了自适应替换缓存

基于节点转发能力的自适应缓存管理策略

组播视频点播系统：自适应缓存与流合并策略

VXMLR系统：基于历史查询的存储模式自适应优化

机会网络中的自适应缓存策略：基于副本数量优化

最新资源