优化高关联性缓存的LRU实现策略

需积分: 5 108 浏览量更新于2024-08-13 收藏 658KB PDF 举报

"这篇论文探讨了在高关联性缓存中实现高效LRU（最近最久未使用）替换策略的方法。随着多媒休、多线程、数据库和低功耗设备在高性能服务器和工作站上对高关联性需求的增加，为了优化性能，设计高效的LRU硬件实现变得至关重要。论文分析了不同LRU策略的实现，包括Square Matrix、Skewed Matrix、Counter、Link-list、Phase和Systolic Array方法，并进行了比较。" 在高性能计算领域，缓存系统扮演着关键角色，它通过存储频繁访问的数据来减少主内存的访问时间，从而提高整体系统性能。LRU替换策略是其中一种广泛应用的策略，因为它在减少缓存缺失率方面表现出色。当缓存的关联性增加时，即每个内存块可以被存储在多个位置时，LRU策略能更有效地处理数据访问模式。然而，随着关联性的提升，LRU的实现复杂度也随之增加。传统的LRU算法通常需要维护一个记录所有缓存行使用情况的列表，这在高关联性情况下可能导致较高的硬件开销和延迟。因此，研究者们提出了各种优化方法来简化LRU的硬件实现： 1. **Square Matrix**：这种方法将缓存映射到一个矩阵中，通过移动矩阵元素来实现LRU。当新的数据进入时，矩阵中的数据会向右下角移动，最老的数据位于左上角，易于识别和替换。 2. **Skewed Matrix**：相比于Square Matrix，Skewed Matrix利用倾斜的结构，使得在高关联性下查找和更新最近使用的数据更加高效。 3. **Counter**：每个缓存行都有一个计数器，记录最近的访问次数。当新的访问发生时，计数器更新，最低计数值的行被替换。 4. **Link-list**：链表结构可以方便地插入和删除元素，维护最近使用的顺序。但是，链表操作可能引入额外的延迟。 5. **Phase**：通过将缓存分成多个阶段，每个阶段有自己的LRU状态，这种方法可以在一定程度上并行化LRU操作，提高效率。 6. **Systolic Array**：这是一种并行计算架构，通过在数组中的元素之间传递信息来执行LRU操作，适合大规模并行计算环境。这些方法各有优缺点，如Square Matrix和Skewed Matrix在硬件实现上相对简单，但可能不适用于极高关联性；Counter方法简单且快速，但可能需要更多的存储空间；Link-list灵活但有额外的访问开销；Phase和Systolic Array则更适合并行环境，但实现复杂度较高。选择哪种实现方式取决于具体的应用场景和性能需求。这篇论文为理解和优化高关联性缓存中的LRU实现提供了深入见解，对于设计高效缓存系统具有重要的指导价值。

Highly Efficient LRU Implementations for High

Associativity Cache Memory

T.S.B. Sudarshan, Rahil Abbas Mir, S.Vijayalakshmi

Birla Institute of Technology and Science, Pilani, Rajasthan 330331 INDIA

tsbs@bits-pilani.ac.in , rahilabbasmir@rediffmail.com , viji@bits-pilani.ac.in

Abstract-High associativity with replacement policy as LRU is an

optimal solution for cache design when miss rate has to be

reduced. But when associativity increases, implementing LRU

policy becomes complex. As many advance and demanding

technologies like multimedia, multithreading, database and low

power devices running on high performance processors in servers

and work stations use higher associativity to enrich performance,

there is a need for designing highly efficient LRU hardware

implementations. This paper gives analyses various

implementations of the LRU policy for a cache with high

associativity. The implementation problems are explored,

objectives of the design are identified and various

implementations namely Square Matrix, Skewed Matrix,

Counter, Link-list, Phase and Systolic Array methods are

compared with each other on the basis of objective outlined. These

implementations are synthesized to determine the constraints and

the effect of increase in associativity on the performance. When

the associativity is smaller, reduction of associated logic is

important and at higher associativity conservation of space is

more important. At higher associativity Linked List, Systolic

Array and Skewed Matrix are the designs found suitable for

implementations.

. INTRODUCTION

Modern processors, commercial systems, high

performance servers, workstation have high associative caches

for performance improvement [15,16,17]. The complexity of

implementation of LRU (Least Recently Used) policy for

highly associative cache tends to increase as the associativity

increases [1,2,3,4,10]. The increase in complexity additionally

increases the delay incurred to detect the line for replacement.

The cache performance is degraded even though a highly

associative cache with LRU policy is used due to inapt

implementation. This paper analyzes

and compares various efficient LRU implementations for

higher associative caches. These designs are analyzed with

respect to their implementation complexity and how fast can

they determine the replacement cache line. The various

implementation of LRU are simulated and synthesized for

comparison. The rest of the paper is organized in the following

manner. Section 2 identifies higher associativity with LRU as

best configuration to reduce miss ratio. Section 3 discusses the

implementation complexity of LRU as associativity increases.

Section 4 examines various implementations, their working

and their characteristics. Section 5 explains the methodology

followed to test the functional correctness of the design, and

evaluation of the performance metric and the results obtained.

Section 6 details the comparison of various implementations

based on the results obtained the conclusions are explained in

Section 7.

II. HIGHER ASSOCIATIVITY WITH LRU POLICY

The classical approach to improve the cache behavior is

reducing miss rate. Increasing associativity in the cache

reduces conflict misses thereby reducing miss rates and

improving performance. Studies have shown that conflict miss

reduces from 28% to 4% when the associativity changes from

1-way to 8-way [2]. Another result showed number of cache

misses reduced from 30,000 to as low as 5000 when a higher

associativity (512 way) cache is used instead of direct mapped

[10]. Further higher associative cache is more efficient when

miss penalty is large and memory inter connect contention

delay is significant and sensitive to the cache miss rate [6].

Increasing Associativity with any replacement policy often

decreases the miss ratio. Better performance of higher

associativity depends on efficient replacement algorithm [4].

The replacement algorithm LRU, that replaces the least used

line in cache, has miss ratio and performance comparable to

optimal (OPT or MIN) algorithm. In LRU policy the line not

referenced for the longest period of time is considered as dead

line and removed from cache.

LRU is currently the most common replacement strategy used

in cache, which gives higher performance [8]. Result from [12]

have shown for many workloads FIFO and random yield

similar performance but the miss ratio of LRU is 12% lower on

the average thus yielding better performance than other

policies. Studies [11] have shown that in the case of larger

associativity LRU can be noticeably improved and made more

optimal when compared to the off-line MIN [7] or the

equivalent OPT algorithms [13]. A high associative cache with

LRU is a better solution for reducing miss rate and improving

performance. This combination has an added advantage of

reducing thrashing provided that associativity value, N is

greater than M, where M is the different blocks that map to the

same set [6]. Results from [23] reveal that cache design affects

the behavior of database application and higher associativity

gives better performance for database workload. Increasing

associativity in Network processor cache removes the problem

of cache conflicts [24] enhancing performance. Y. Markoskiy

and Y. Patel [20] identifies one of the technique to c-slow a

processor is to increase associativity, because higher

associativity is useful in providing huge threads, to limit

thrashing in multithreading. Higher associativity is reasonable

way to increase the physically addressed cache size for it does

not increase the translation hardware [21]. Higher associativity

下载后可阅读完整内容，剩余8页未读，立即下载

leeping

粉丝: 0
资源: 4

优化高关联性缓存的LRU实现策略

最新资源