Kernel-User Space Separation in DRAM Memory
Xi Li
1,2
, Beilei Sun
1,2
, Zongwei Zhu
1,2
, Chao Wang
1,2
, and Xuehai Zhou
1,2
1
Suzhou Institute for Advanced Study, University of Science and Technology of China (USTC), Suzhou, China
2
Department of Computer Science and Technology, USTC, Hefei, 230027, China
Email:
{sasbl, zzw1988, saintwc}@mail.ustc.edu.cn, {llxx, xhzhou}@ustc.edu.cn
Abstract—Performance of software is increasingly restricted
by the Memory Wall instead of CPU. Many studies focus on
alleviating the DRAM latency by improving the row-buffer hit
rate. But most of them treat the Kernel
1
and User
2
equally.
Data used by Operating System and User applications spread
in different rows of the same bank, leading to the contentions
for the row-buffer when they access the bank successively. We
find that contentions between Kernel and User make up of
a great proportion of all the row-buffer misses. To alleviate
the contentions between Kernel and User, we divide the united
DRAM memory space into Kernel-Space and User-Space. A new
page-allocation-system, the K/U-Aware page-allocation-system,is
proposed to manage Kernel-Space and User-Space in DRAM
memory in different address mapping schemes of DRAM memory
controller. In the new system, pages are allocated from different
spaces according to applicants (Kernel or User). Sizes of the
two spaces increase and decrease dynamically as required. For
benchmarks in PARSEC suites, the proposed system reduces the
contentions of Kernel and User effectively, producing significant
improvements of row-buffer hit rate. The execution time is
reduced by 9.45% (max. 20.45%) and 6.51% (max. 18.05%)
respectively in two typical address mapping schemes.
I. INTRODUCTION
The ever increasing power of CPU highlights the memory-
wall problem [1]. Instead of CPU performance, DRAM memo-
ry latency has become the performance bottleneck of the com-
puter system. A typical DRAM architecture is shown in Fig.1.
Bank is the independent memory array inside a DRAM device.
Access the array occur in the granularity of rows. To alleviate
the memory-wall problem, row-buffer is added to cache the
data fetched from a row in DRAM banks. If the same row
is accessed in succession, the DRAM latency is significantly
reduced because of the row-buffer hit. But if different rows of
the same bank are accessed successively, longer latency occurs
because of the row-buffer miss. Improving the row-buffer-hit-
rate (RBH) is very effective to reduce the DRAM latency.
A variety of methods have been explored to improve the
RBH. A permutation based page interleaving scheme to reduce
row-buffer conflicts and to exploit data access locality in the
row-buffer is proposed in [3]. The row that will be accessed is
predicted, and pre-fetched into the row-buffer in [4], [5]. Page
size is reduced to enhance memory access efficiency in [19].
Some others try to alleviate the contentions by reorganizing the
memory access sequence in DRAM controller, and improve the
RBH at the same time [2].
All the researches that try to improve the DRAM’s perfor-
mance ignore the interferences between Kernel and User when
1
Operating System is called Kernel instead in this paper.
2
The applications in the user-mode is called User instead in this paper.
accessing DRAM memory. As the administrator of computer
systems, Kernel is in charge of hardware and provides services
to User through various system calls. Kernel is also responsible
for managing and scheduling User applications. Determined
by Kernel’s special features, it performs greatly different
behaviors compared with User. [6], [10] make an in-depth
analysis of the interferences and differences between Kernel
and User on cache, branch prediction and translation look
aside buffer (TLB). Due to the data size, Kernel can easily
overwhelm the cache and TLB. Execution of Kernel is always
brief and intermittent, thus TLB and cache are replaced with
little benefits. [8] reveals the close relationship between cache
and DRAM, so Kernel’s special influences on cache will affect
the DRAM memory obviously. Since Kernel and User interact
with each other more and more frequently [6], the ignorance
of the interferences between Kernel and User when accessing
DRAM will continually reduce the row-buffer efficiency.
In this paper, we firstly analyze the interferences between
Kernel and User when accessing the same bank. Without losing
generality, we studied different address mapping schemes
of DRAM memory controller (DMC). The address mapping
scheme is used to denote the scheme whereby a given physical
address is resolved into indices in a DRAM memory system in
terms of channel ID, rank ID, bank ID, row ID and column ID.
Two typical address mapping schemes, the Bank:Row:Column
(B:R:C) and the Row:Bank:Column (R:B:C) mapping schemes,
are analyzed in this paper. Kernel to User switch (K2U-Switch)
occurs when User accesses the DRAM after Kernel. User to
Kernel switch (U2K-switch) is defined in the similar way. They
are called K/U-Switch collectively. We have observed K/U-
Switches occur frequently. Kernel and User rarely share the
same row on K/U-Switches. To analyze how the interferences
between Kernel and User reduce the row-buffer efficiency,
we quantify the row-buffer misses caused by K/U-Switches.
Proved by our experiments, the row-buffer misses caused
by K/U-Switches contribute greatly to the overall row-buffer
misses when accessing DRAM in both the B:R:C scheme and
R:B:C scheme. Particularly, for some banks, K/U-Switches
have been the major causes of row-buffer misses. Thus, Kernel
and User interfere with each other seriously when accessing
DRAM, leading to the unforgettable reduction of row-buffer
efficiency.
To reduce the row-buffer contentions between Kernel and
User, we propose to divide the united DRAM memory space
into Kernel-Space and User-Space, which are used by Kernel
and User respectively. A new page-allocation-system, the K/U-
aware page-allocation-system, is proposed to manage the
separated DRAM spaces. In the new system, we reorganized
the way Kernel manage the physical memory pages. Pages are
2014 IEEE International Symposium on Parallel and Distributed Processing with Applications
978-1-4799-4293-0/14 $31.00 © 2014 IEEE
DOI 10.1109/ISPA.2014.40
237