优化DRAM性能：内核用户空间分离技术

研究论文

100 浏览量更新于2024-08-26 收藏 778KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"DRAM内存中的内核用户空间分离——一种缓解DRAM延迟的策略" 在现代计算机系统中，软件性能越来越受到内存墙（Memory Wall）的限制，而非CPU速度。研究焦点逐渐转向通过提高行缓冲区命中率来减轻DRAM（动态随机存取存储器）的延迟。然而，大多数方法在处理这个问题时将内核（Kernel）和用户空间（User Space）视为同等对待，忽视了两者之间的差异。操作系统和用户应用程序使用的数据分散在同一银行的不同行中，导致当它们连续访问这些银行时产生行缓冲区竞争。研究发现，内核与用户空间之间的竞争构成了行缓冲区未命中问题的一个显著部分。为了缓解这种内核与用户空间之间的竞争，提出了一个创新的内存管理策略，即DRAM内存中的内核用户空间分离。这种方法将统一的DRAM内存空间划分为内核空间（Kernel-Space）和用户空间（User-Space），并设计了一个新的页面分配系统，称为K/U-Aware页面分配系统。 K/U-Aware页面分配系统的主要目标是优化内核和用户空间的内存分配，减少它们之间的冲突。该系统考虑了内核和用户空间的访问模式和特性，确保它们在物理内存中的布局能够最大化行缓冲区的利用率。通过智能地分配和管理内存页面，K/U-Aware系统可以降低因不同空间之间的数据访问造成的缓冲区冲突，从而提高整体系统性能。具体实现上，该系统可能采用了诸如优先级分配、预热策略或者空间预留等技术，以避免内核和用户空间在内存访问上的重叠。此外，该系统可能还结合了缓存策略，以进一步减少DRAM访问的延迟。通过这样的设计，即使在高并发和多任务环境下，也能有效减少内存访问的延迟，提升系统的响应速度和吞吐量。 "DRAM内存中的内核用户空间分离"是一种针对内存性能瓶颈的优化策略，通过区分内核和用户空间的内存管理，降低它们之间的竞争，从而提高整个计算机系统的运行效率。这一研究对于理解现代计算机系统内存管理的优化有着重要的理论和实践价值，对于硬件和软件开发者来说都是值得关注和借鉴的。

资源详情

资源推荐

Kernel-User Space Separation in DRAM Memory

Xi Li

1,2

, Beilei Sun

1,2

, Zongwei Zhu

1,2

, Chao Wang

1,2

, and Xuehai Zhou

1,2

Suzhou Institute for Advanced Study, University of Science and Technology of China (USTC), Suzhou, China

Department of Computer Science and Technology, USTC, Hefei, 230027, China

Email:

{sasbl, zzw1988, saintwc}@mail.ustc.edu.cn, {llxx, xhzhou}@ustc.edu.cn

Abstract—Performance of software is increasingly restricted

by the Memory Wall instead of CPU. Many studies focus on

alleviating the DRAM latency by improving the row-buffer hit

rate. But most of them treat the Kernel

and User

equally.

Data used by Operating System and User applications spread

in different rows of the same bank, leading to the contentions

for the row-buffer when they access the bank successively. We

ﬁnd that contentions between Kernel and User make up of

a great proportion of all the row-buffer misses. To alleviate

the contentions between Kernel and User, we divide the united

DRAM memory space into Kernel-Space and User-Space. A new

page-allocation-system, the K/U-Aware page-allocation-system,is

proposed to manage Kernel-Space and User-Space in DRAM

memory in different address mapping schemes of DRAM memory

controller. In the new system, pages are allocated from different

spaces according to applicants (Kernel or User). Sizes of the

two spaces increase and decrease dynamically as required. For

benchmarks in PARSEC suites, the proposed system reduces the

contentions of Kernel and User effectively, producing signiﬁcant

improvements of row-buffer hit rate. The execution time is

reduced by 9.45% (max. 20.45%) and 6.51% (max. 18.05%)

respectively in two typical address mapping schemes.

I. INTRODUCTION

The ever increasing power of CPU highlights the memory-

wall problem [1]. Instead of CPU performance, DRAM memo-

ry latency has become the performance bottleneck of the com-

puter system. A typical DRAM architecture is shown in Fig.1.

Bank is the independent memory array inside a DRAM device.

Access the array occur in the granularity of rows. To alleviate

the memory-wall problem, row-buffer is added to cache the

data fetched from a row in DRAM banks. If the same row

is accessed in succession, the DRAM latency is signiﬁcantly

reduced because of the row-buffer hit. But if different rows of

the same bank are accessed successively, longer latency occurs

because of the row-buffer miss. Improving the row-buffer-hit-

rate (RBH) is very effective to reduce the DRAM latency.

A variety of methods have been explored to improve the

RBH. A permutation based page interleaving scheme to reduce

row-buffer conﬂicts and to exploit data access locality in the

row-buffer is proposed in [3]. The row that will be accessed is

predicted, and pre-fetched into the row-buffer in [4], [5]. Page

size is reduced to enhance memory access efﬁciency in [19].

Some others try to alleviate the contentions by reorganizing the

memory access sequence in DRAM controller, and improve the

RBH at the same time [2].

All the researches that try to improve the DRAM’s perfor-

mance ignore the interferences between Kernel and User when

Operating System is called Kernel instead in this paper.

The applications in the user-mode is called User instead in this paper.

accessing DRAM memory. As the administrator of computer

systems, Kernel is in charge of hardware and provides services

to User through various system calls. Kernel is also responsible

for managing and scheduling User applications. Determined

by Kernel’s special features, it performs greatly different

behaviors compared with User. [6], [10] make an in-depth

analysis of the interferences and differences between Kernel

and User on cache, branch prediction and translation look

aside buffer (TLB). Due to the data size, Kernel can easily

overwhelm the cache and TLB. Execution of Kernel is always

brief and intermittent, thus TLB and cache are replaced with

little beneﬁts. [8] reveals the close relationship between cache

and DRAM, so Kernel’s special inﬂuences on cache will affect

the DRAM memory obviously. Since Kernel and User interact

with each other more and more frequently [6], the ignorance

of the interferences between Kernel and User when accessing

DRAM will continually reduce the row-buffer efﬁciency.

In this paper, we ﬁrstly analyze the interferences between

Kernel and User when accessing the same bank. Without losing

generality, we studied different address mapping schemes

of DRAM memory controller (DMC). The address mapping

scheme is used to denote the scheme whereby a given physical

address is resolved into indices in a DRAM memory system in

terms of channel ID, rank ID, bank ID, row ID and column ID.

Two typical address mapping schemes, the Bank:Row:Column

(B:R:C) and the Row:Bank:Column (R:B:C) mapping schemes,

are analyzed in this paper. Kernel to User switch (K2U-Switch)

occurs when User accesses the DRAM after Kernel. User to

Kernel switch (U2K-switch) is deﬁned in the similar way. They

are called K/U-Switch collectively. We have observed K/U-

Switches occur frequently. Kernel and User rarely share the

same row on K/U-Switches. To analyze how the interferences

between Kernel and User reduce the row-buffer efﬁciency,

we quantify the row-buffer misses caused by K/U-Switches.

Proved by our experiments, the row-buffer misses caused

by K/U-Switches contribute greatly to the overall row-buffer

misses when accessing DRAM in both the B:R:C scheme and

R:B:C scheme. Particularly, for some banks, K/U-Switches

have been the major causes of row-buffer misses. Thus, Kernel

and User interfere with each other seriously when accessing

DRAM, leading to the unforgettable reduction of row-buffer

efﬁciency.

To reduce the row-buffer contentions between Kernel and

User, we propose to divide the united DRAM memory space

into Kernel-Space and User-Space, which are used by Kernel

and User respectively. A new page-allocation-system, the K/U-

aware page-allocation-system, is proposed to manage the

separated DRAM spaces. In the new system, we reorganized

the way Kernel manage the physical memory pages. Pages are

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications

DOI 10.1109/ISPA.2014.40

237

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38628243

粉丝: 1
资源: 907

优化DRAM性能：内核用户空间分离技术

DRAM内存颗粒测试简介PPT课件.pptx

dram_fileview.rar

os select for dram

【dram存储器三】内存颗粒内部结构

dram技术精解(第二版 中文)

动态内存DRAM和静态内存SRAM的区别

已知某64位机主存采用半导体存储器，其地址码为26位，若使用4M8位的DRAM 芯片组成该 点，即使断电电 机所允许的最大主存空间，并选用内存条结构形式，问: 主存共需多少DRAM芯片?CPU如何选择各内存条?

已知某64位机主存采用半导体存储器，其地址码为26位，若使用4M8位的DRAM 芯片组成该 点，即使断电电 机所允许的最大主存空间，并选用内存条结构形式，问: 每个内存条内共有多少DRAM 芯片?

MTK_TEE_DRAM_SIZE = 0x5500000，扩大怎么扩大

dram OTF是什么

DRAM中刷新和重写的区别

DRAM OPEN/SHORT

dram器件中的6f2尺寸是怎么定义的

DRAM中刷新和再生的区别

jedec jesd235d:2021 高带宽内存 (hbm) dram

SRAM和DRAM有什么区别？

某计算机系统使用半导体存储器DRAM构建主存，其地址线，数据线均为32位，已知DRAM芯片规格为：64M×16位，若要组成4GB主存，并采用内存条的形式，问： （1）若每个内存条为256M×32位，共需要内存条数量是：

dram和sram区别

最新资源

dram技术精解(第二版中文)

已知某64位机主存采用半导体存储器，其地址码为26位，若使用4M8位的DRAM 芯片组成该点，即使断电电机所允许的最大主存空间，并选用内存条结构形式，问: 主存共需多少DRAM芯片?CPU如何选择各内存条?

已知某64位机主存采用半导体存储器，其地址码为26位，若使用4M8位的DRAM 芯片组成该点，即使断电电机所允许的最大主存空间，并选用内存条结构形式，问: 每个内存条内共有多少DRAM 芯片?

某计算机系统使用半导体存储器DRAM构建主存，其地址线，数据线均为32位，已知DRAM芯片规格为：64M×16位，若要组成4GB主存，并采用内存条的形式，问：（1）若每个内存条为256M×32位，共需要内存条数量是：