ZFS FAST ARC缓存淘汰算法论文原文分析

下载需积分: 10 | PDF格式 | 366KB | 更新于2024-07-16 | 122 浏览量 | 举报

1 收藏

"FAST_ARC_Paper.pdf 是一篇关于ZFS文件系统中FAST Adaptive Replacement Cache (ARC)缓存淘汰算法的论文。这篇论文由Nimrod Megiddo和Dharmendra S. Modha在2003年USENIX Conference on File and Storage Technologies上发表，探讨了在统一页面大小的按需分页场景中的缓存管理问题，并提出了一种新的自适应替换缓存策略——ARC，该策略具有多种优势。" 在现代计算机系统中，缓存管理是提高性能的关键组成部分，尤其是在文件系统中。ZFS，一个先进的文件系统，采用了FAST ARC算法来优化其内存缓存，以减少磁盘I/O延迟，提高数据访问速度。ARC设计的目标是在低开销的情况下实现自我调优，以适应不断变化的工作负载。传统的缓存策略如LRU（最近最少使用）和LFU（最不经常使用）在某些情况下可能会失效，因为它们不能充分考虑到工作负载的动态性和复杂性。FAST ARC通过引入两个独立的缓存层次来解决这个问题：主缓存（MARC）和次级缓存（LARC）。主缓存保持最活跃的数据，而次级缓存则存储那些可能在未来再次需要但目前不太频繁访问的数据。 MARC和LARC之间的动态平衡是ARC的核心。当新数据进入缓存时，如果主缓存已满，那么最不活跃的主缓存项将被替换，同时考虑这些项可能的未来使用情况。LARC则作为历史记录，保存那些近期不太活跃但可能在较长时间后重新变得重要的数据。这种设计使得ARC能够适应不同的工作负载模式，同时避免了过于依赖单一的替换策略。此外，ARC还具有自我学习和调整的能力。它会根据数据的访问模式和历史信息来优化其内部结构，调整MARC和LARC的大小比例，以最大限度地提高命中率。这种方法可以更好地预测哪些数据应该保留在缓存中，哪些数据应该被淘汰，从而提高整体性能。在实际应用中，FAST ARC已被证明比传统的缓存策略更有效，尤其是在面对混合工作负载时。由于其自我调优和低开销的特性，ARC成为了ZFS文件系统中的一个重要组成部分，有助于提供高效、可靠的存储服务。 "FAST_ARC_Paper.pdf"这篇论文深入探讨了ZFS文件系统中FAST ARC缓存管理策略的原理与优势，对于理解现代文件系统缓存设计以及优化存储性能有着重要的参考价值。它不仅提供了理论基础，也为其他领域的缓存设计提供了创新思路。

2nd USENIX Conference on File and Storage Technologies

USENIX Association

117

the workload or the request stream is drawn from a

LRU Stack Depth Distribution (SDD), then LRU is the

optimal policy [16]. LRU has several advantages, for

example, it is simple to implement and responds well

to changes in the underlying SDD model. However,

while the SDD model captures “recency”, it does not

capture “frequency”. To quote from [16, p. 282]: “The

signiﬁcance of this is, in the long run, that each page

is equally likely to be referenced and that therefore

the model is useful for treating the clustering effect of

locality but not the nonuniform page referencing.”

C. Frequency

The Independent Reference Model (IRM) provides a

workload characterization that captures the notion of

frequency. Speciﬁcally, IRM assumes that each page

reference is drawn in an independent fashion from

a ﬁxed distribution over the set of all pages in the

auxiliary memory. Under the IRM model, policy LFU

that replaces the least frequently used page is known

to be optimal [16], [17]. The LFU policy has sev-

eral drawbacks: it requires logarithmic implementation

complexity in cache size, pays almost no attention to

recent history, and does not adapt well to changing

access patterns since it accumulates stale pages with

high frequency counts that may no longer be useful.

A relatively recent algorithm LRU-2 [18], [19] ap-

proximates LFU while eliminating its lack of adaptivity

to the evolving distribution of page reference frequen-

cies. This was a signiﬁcant practical step forward. The

basic idea is to remember, for each page, the last

two times when it was requested, and to replace the

page with the least recent penultimate reference. Under

the IRM assumption, it is known that LRU-2 has the

largest expected hit ratio of any on-line algorithm that

knows at most two most recent references to each page

[19]. The algorithm has been shown to work well on

several traces [18], [20]. Nonetheless, LRU-2 still has

two practical limitations [20]: (i) it needs to maintain

a priority queue, and, hence, it requires logarithmic

implementation complexity and (ii) it contains at one

crucial tunable parameter, namely, Correlated Informa-

tion Period (CIP), that roughly captures the amount of

time a page that has only been seen once recently should

be kept in the cache.

In practice, logarithmic implementation complexity

is a severe overhead, see, Table I. This limitation was

mitigated in 2Q [20] which reduces the implementation

complexity to constant per request. The algorithm 2Q

uses a simple LRU list instead of the priority queue

used in LRU-2; otherwise, it is similar to LRU-2. Policy

ARC has a computational overhead similar to 2Q and

both are better than LRU-2, see, Table I.

Table II shows that the choice of the parameter CIP

c LRU ARC 2Q LRU-2 LRFU

5 6 8 9 5 6 8 : ; < <

1024 17 14 17 33 554 408 28

2048 12 14 17 27 599 451 28

4096 12 15 17 27 649 494 29

8192 12 16 18 28 694 537 29

16384 13 16 19 30 734 418 30

32768 14 17 18 31 716 420 31

65536 14 16 18 32 648 424 34

131072 14 15 16 32 533 432 39

262144 13 13 14 30 427 435 42

524288 12 13 13 27 263 443 45

TABLE I. A comparison of computational overhead of various

cache algorithms on a trace P9 that was collected from a

workstation running Windows NT by using Vtrace which

captures disk requests. For more details of the trace, see

Section V-A. The cache size

represents number of

>  

byte

pages. To obtain the numbers reported above, we assumed

that a miss costs nothing more than a hit. This focuses the

attention entirely on the “book-keeping” overhead of the cache

algorithms. All timing numbers are in seconds, and were

obtained by using the “clock()” subroutine in “time.h” of

the GNU C compiler. It can be seen that the computational

overhead of ARC and 2Q is essentially the same as that of

LRU. It can also be seen that LRU-2 has roughly double

the overhead of LRU, and that LRFU can have very large

overhead when compared to LRU. The same general results

hold for all the traces that we examined.

crucially affects performance of LRU-2. It can be seen

that no single ﬁxed a priori choice works uniformly well

across across various cache sizes, and, hence, judicious

selection of this parameter is crucial to achieving good

performance. Furthermore, we have found that no single

a priori choice works uniformly well across across

various workloads and cache sizes that we examined.

For example, a very small value for the CIP parameters

work well for stable workloads drawn according to the

IRM, while a larger value works well for workloads

drawn according to the SDD. Indeed, it has been

previously noted [20] that “it was difﬁcult to model the

tunables of the algorithm exactly.” This underscores the

need for on-line, on-the-ﬂy adaptation.

Unfortunately, the second limitation of LRU-2 per-

sists even in 2Q. The authors introduce two parameters

(

? @ B

and

? D E F

) and note that “Fixing these parameters

is potentially a tuning question

! ! !

” [20]. The parameter

? @ B

is essentially the same as the parameter CIP

in LRU-2. Once again, it has been noted [21] that

“

? @ B

and

? D E F

are predetermined parameters in 2Q,

which need to be carefully tuned, and are sensitive to

types of workloads.” Due to space limitation, we have

shown Table II only for LRU-2, however, we have

observed similar dependence of 2Q on the workload

剩余16页未读，继续阅读

wavesan

粉丝: 0

ZFS FAST ARC缓存淘汰算法论文原文分析

leetcode下载-yyll0081111:DataAnalysis_notebook

创建字符串“pencil，pen,ballpen,eraser,parper,ruler,book,bag”

cole_02_0507.pdf

工程硕士开题报告：无线传感器网络路由技术及能量优化LEACH协议研究

【东海期货-2025研报】东海贵金属周度策略：金价高位回落，阶段性回调趋势初现.pdf

图像数据处理工具+数据(帮助用户快速划分数据集并增强图像数据集。通过自动化数据处理流程，简化了深度学习项目的数据准备工作)

diminico_02_0709.pdf

agenda_3cd_01_0716.pdf

A课件Python全栈开发线下班.zip

diminico_02_1108.pdf

最新资源