
Highly Efficient LRU Implementations for High
Associativity Cache Memory
T.S.B. Sudarshan, Rahil Abbas Mir, S.Vijayalakshmi
Birla Institute of Technology and Science, Pilani, Rajasthan 330331 INDIA
tsbs@bits-pilani.ac.in , rahilabbasmir@rediffmail.com , viji@bits-pilani.ac.in
Abstract-High associativity with replacement policy as LRU is an
optimal solution for cache design when miss rate has to be
reduced. But when associativity increases, implementing LRU
policy becomes complex. As many advance and demanding
technologies like multimedia, multithreading, database and low
power devices running on high performance processors in servers
and work stations use higher associativity to enrich performance,
there is a need for designing highly efficient LRU hardware
implementations. This paper gives analyses various
implementations of the LRU policy for a cache with high
associativity. The implementation problems are explored,
objectives of the design are identified and various
implementations namely Square Matrix, Skewed Matrix,
Counter, Link-list, Phase and Systolic Array methods are
compared with each other on the basis of objective outlined. These
implementations are synthesized to determine the constraints and
the effect of increase in associativity on the performance. When
the associativity is smaller, reduction of associated logic is
important and at higher associativity conservation of space is
more important. At higher associativity Linked List, Systolic
Array and Skewed Matrix are the designs found suitable for
implementations.
I
. INTRODUCTION
Modern processors, commercial systems, high
performance servers, workstation have high associative caches
for performance improvement [15,16,17]. The complexity of
implementation of LRU (Least Recently Used) policy for
highly associative cache tends to increase as the associativity
increases [1,2,3,4,10]. The increase in complexity additionally
increases the delay incurred to detect the line for replacement.
The cache performance is degraded even though a highly
associative cache with LRU policy is used due to inapt
implementation. This paper analyzes
and compares various efficient LRU implementations for
higher associative caches. These designs are analyzed with
respect to their implementation complexity and how fast can
they determine the replacement cache line. The various
implementation of LRU are simulated and synthesized for
comparison. The rest of the paper is organized in the following
manner. Section 2 identifies higher associativity with LRU as
best configuration to reduce miss ratio. Section 3 discusses the
implementation complexity of LRU as associativity increases.
Section 4 examines various implementations, their working
and their characteristics. Section 5 explains the methodology
followed to test the functional correctness of the design, and
evaluation of the performance metric and the results obtained.
Section 6 details the comparison of various implementations
based on the results obtained the conclusions are explained in
Section 7.
II. HIGHER ASSOCIATIVITY WITH LRU POLICY
The classical approach to improve the cache behavior is
reducing miss rate. Increasing associativity in the cache
reduces conflict misses thereby reducing miss rates and
improving performance. Studies have shown that conflict miss
reduces from 28% to 4% when the associativity changes from
1-way to 8-way [2]. Another result showed number of cache
misses reduced from 30,000 to as low as 5000 when a higher
associativity (512 way) cache is used instead of direct mapped
[10]. Further higher associative cache is more efficient when
miss penalty is large and memory inter connect contention
delay is significant and sensitive to the cache miss rate [6].
Increasing Associativity with any replacement policy often
decreases the miss ratio. Better performance of higher
associativity depends on efficient replacement algorithm [4].
The replacement algorithm LRU, that replaces the least used
line in cache, has miss ratio and performance comparable to
optimal (OPT or MIN) algorithm. In LRU policy the line not
referenced for the longest period of time is considered as dead
line and removed from cache.
LRU is currently the most common replacement strategy used
in cache, which gives higher performance [8]. Result from [12]
have shown for many workloads FIFO and random yield
similar performance but the miss ratio of LRU is 12% lower on
the average thus yielding better performance than other
policies. Studies [11] have shown that in the case of larger
associativity LRU can be noticeably improved and made more
optimal when compared to the off-line MIN [7] or the
equivalent OPT algorithms [13]. A high associative cache with
LRU is a better solution for reducing miss rate and improving
performance. This combination has an added advantage of
reducing thrashing provided that associativity value, N is
greater than M, where M is the different blocks that map to the
same set [6]. Results from [23] reveal that cache design affects
the behavior of database application and higher associativity
gives better performance for database workload. Increasing
associativity in Network processor cache removes the problem
of cache conflicts [24] enhancing performance. Y. Markoskiy
and Y. Patel [20] identifies one of the technique to c-slow a
processor is to increase associativity, because higher
associativity is useful in providing huge threads, to limit
thrashing in multithreading. Higher associativity is reasonable
way to increase the physically addressed cache size for it does
not increase the translation hardware [21]. Higher associativity