CircularCache: Scalable and Adaptive Cache
Management for Massive Storage Systems
Liqiong Liu, Xiaoyang Qu, Yubiao Zhang, Xiaodong Yi, Siwang Zeng, Jiguang Wan*, Changsheng Xie
National Laboratory for Optoelectronics
Huazhong University of Science and Technology
Wuhan, Hubei, P.R. China.
Email:{liulqiong, quxiaoy}@gmail.com, {jgwan,cs
xie}@mail.hust.edu.cn
Abstract—In order to enhance the performance of HDD-
based storage systems, low-latency and high-IOPS SSDs are
usually deployed as a cache above HDDs. With explosive data
growth, a large-scale SSD-based cache tend to adopt partition
management for overall cached data distribution across multiple
cache nodes. We proposed an adaptive and scalable SSD-based
cache called CircularCache, which distributes hot data across
multiple cache nodes. The hotter virtual disks deserve more
allocated free space in the SSD-cache. This paper exploited a
dynamic replacement algorithm called VBQ(VDI-Based Queues)
to manage the SSD-cache. The VBQ scheme manages the SSD-
cache by dynamically manipulating the upper-bounds and lower-
bounds of multiple queues based on the total access number
of virtual disks. To mitigate negative impacts of destaging on
overall storage performance, the dirty data in the cache will
be written back to data nodes during idle time. At the same
time, we utilize the redundant storage space in the data nodes
as logging area to retain reliability of the dirty data on the SSD-
cache. The prototype of CircularCache is implemented based on
Sheepdog. Experimental results show that CircularCache offers
a performance improvement by up to 270% compared with the
standard distributed storage system without an SSD-based cache.
I. INTRODUCTION
The Infrastructure-as-a-Service(IaaS) can offer virtualiza-
tion platform interfaces. With IaaS, clients can share available
hardware(physical) resources: storage resources, computing
resources, and network infrastructures. The infrastructure re-
sources are owned and managed by providers and can be
purchased by customers on-demand. Several service corpo-
rations have provided this service, such as Amazon Web
Services(AWS) and Microsoft Azure. With development of
IaaS, block-level distributed storage systems become more
and more important. For distributed storage systems, the
block-level interface can be adapted to heterogeneous client
applications.
For block-level distributed storage systems, virtual disks
[22] are proposed as interfaces to access storage resources.
For storage resource virtualization, virtual disks can make the
client’s view of storage transparent to physical storage re-
source. Consumers can purchase physical resource on demand,
which make the resource allocation more flexible.
Solid State Drives(SSDs) have attracted large amount of
attention due to its high performance and low energy consump-
* Corresponding author
tion. Because of high price and low capacity of SSDs, it is
popular to utilize high performance of SSDs and high capacity
of HDDs to architect hybrid storage systems. SSD-based
arrays have been proposed to construct high-performance
storage system. However, SSDs are too expensive compared to
HDDs so far, they cannot replace the HDDs in a few years. In
addition, the amount of cold data accounts for a large fraction
of total data. This may reduce cost-performance of the overall
storage system. Thereby, SSDs are widely used for hybrid
storage systems where the SSDs cache hot data. In this paper,
we use several SSD nodes as a global cache above HDD nodes.
With the explosive growth in internet use, the amount of new
data is generated exponentially over time. With the explosive
data growth, conventional cache data partitions fail to adapt
to rapid data growth. In other words, the cache size should
be scalable and adaptive. For a large-scale cache consists of
multiple nodes, it is challenging to efficiently distribute the
data across multiple cache nodes. In addition, the associated
replacement policy should consider not only the recency and
frequency but also the characteristics of access patterns in
virtual environments.
With sharply increasing amount of data, the global SSD-
cache should be scalable and adaptive. We exploit a scalable
SSD-based cache called CircularCache, which supports hot
data distributions across multiple cache nodes. For Circular-
Cache, it is challenging to distribute hot data evenly and to
balance the workload, thereby this paper employs consistent
hashing as the cache partition policy. In addition, this paper
exploits a dynamic replacement algorithm called VBQ(VDI-
Based Queues) to manage the SSD-cache based on the hotness
of VDIs(Virtual Disk Image). The hotter virtual disks deserve
more allocated space in the SSD-cache. We also propose a
novel replacement policy to manage SSD-caches by dynam-
ically manipulating the upper bounds and lower bounds. To
provide services for thousands of clients, there are thousands
of virtual disks. The policy adjusts the available storage space
of every queue dynamically based on the real-time access
number of virtual disks. To reduce the impact of destaging
on overall performance, the dirty data in the cache will be
written back to data nodes during idle time. At the same time,
we take advantage of the redundant storage in the data nodes
as logging areas to retain the original data reliability of the
system.
978-1-5090-3315-7/16/$31.00 ©2016 IEEE