优化多核系统性能：线程与内存调度的协同策略

88 浏览量更新于2024-08-27 收藏 467KB PDF 举报

随着计算机设计者们面对日益扩大的处理器速度与动态随机访问存储器(DRAM)速度之间的差距，寻求有效解决方案变得至关重要。本文标题"CombineThreadwithMemorySchedulingforMaximizingPerformanceinMulti-coreSystems"探讨了如何通过结合线程管理和内存调度，来优化多核系统性能并解决内存争用和干扰问题。首先，文章指出，随着微处理器性能的提升，DRAM速度的增长相对滞后，这导致了性能瓶颈。为缩小这个差距，研究人员必须考虑提高DRAM的速度和带宽。在多核平台上，由于所有核心共享同一内存，内存竞争和相互干扰成为关键挑战。当多个线程并发运行时，这些因素可能导致性能下降和不公平性，影响整体系统效率。为解决这些问题，作者提出了一种创新方法，即线程分区和内存调度的结合。线程分区技术将工作负载分配到不同的核心，减少内存竞争，确保每个核心拥有足够的本地内存空间，从而提高其执行效率。通过智能的内存调度策略，如优先级基于需求、局部性感知或者缓存优化，可以减少数据在不同核心间的频繁迁移，进一步降低内存访问延迟。此外，文章可能还会涉及内存带宽管理，比如采用动态调整策略，根据核心的实际需求动态分配带宽，避免资源浪费。另外，可能会讨论使用虚拟化技术，如NUMA（Non-Uniform Memory Access）架构，以更好地利用多核系统中的物理内存分布，提升全局性能。文中可能还包含实验和分析部分，展示了这些技术在实际多核系统中的性能提升效果，以及对不同应用场景的适用性。研究结果可能表明，通过有效地整合线程管理和内存调度，可以在多核系统中实现性能的最大化，降低延迟，提升并行任务的公平性和整体系统响应速度。总结来说，这篇研究论文的核心内容是针对多核系统中的内存问题提出了一种综合性的解决方案，通过线程管理和内存调度的协同作用，旨在克服内存争用和提升系统性能。这不仅对于硬件优化具有理论价值，也为软件开发者提供了在多核环境中高效编程的新思路和技术参考。

Combine Thread with Memory Scheduling for

Maximizing Performance in Multi-core Systems

Gangyong Jia

, Guangjie Han

, Liang Shi

, Jian Wan

, Dong Dai

Department of Computer Science and Technology, Hangzhou Dianzi University

Hangzhou, 310018, China

Department of Computer Science, Hohai University

Changzhou, 213022, China

Department of Computer Science and Technology, Chongqing University

Chongqing, 400044, China

Department of Computer Science, Texas Tech University

Lubbock TX 79409, USA

gangyong@mail.ustc.edu.cn; hanguangjie@gmail.com; shiliang@cqu.edu.cn; dongdaily@gmail.com

Abstract

The growing gap between microprocessor speed and DRAM

speed is a major problem that computer designers are facing. In

order to narrow the gap, it is necessary to improve DRAM’s

speed and throughput. Moreover, on multi-core platforms,

DRAM memory shared by all cores usually suffers from the

memory contention and interference problem, which can cause

serious performance degradation and unfairness among parallel

running threads. To address these problems, this paper proposes

techniques to take both advantages of partitioning cores, threads

and memory banks into groups to reduce interference among

different groups and grouping the memory accesses of the same

row together to reduce cache miss rate. A memory optimization

framework combined thread scheduling with memory scheduling

(CTMS) is proposed in this paper, which simultaneously

minimizes memory access schedule length, memory access time

and reduce interference to maximize performance for multi-core

systems. Experimental results show CTMS is 12.6% shorter in

memory access time, while improving 11.8% throughput on

average. Moreover, CTMS also saves 5.8% of the energy

consumption.

Keywords—Thread scheduling; memory scheduling; memory

interference; memory access time; performance; energy

1. INTRODUCTION

The growing gap between microprocessor speed and

DRAM speed is a major problem that computer designers are

facing [1-3]. More seriously, as the multi-core becoming the

dominant platforms, DRAM memory shared by all cores

usually suffers from the memory contention and interference

problem, which can cause serious performance degradation and

unfairness among parallel running threads. Specifically,

modern multi-core machines consist of many components, such

as processing cores, prefetchers and DMA engines, which can

generate memory requests with different characteristics and

priorities. For example, different cores can generate memory-

intensive and non-intensive requests simultaneously;

prefetchers’ requests are of low priority and DMA engines’

requests are sequential. If memory controllers are unable to

distinguish these different requests, interference inevitably

occurs [4, 5].

A number of recently proposed memory resource

partitioning [6-11, 32, 33] and memory access scheduling [12-

15] algorithms, leveraging the different characteristics

information and three micro operations besides the data

transfer: bank precharge, row activate and column access

respectively, have been demonstrated to be able to effectively

reduce the memory contention and interference and minimize

memory access schedule length and memory access time. For

instance, TCM [6], which classifies threads into memory-

intensive group and CPU-intensive group and uses different

policies for the two groups, is shown to exhibit both

performance and QoS improvements for overall system. Rixner

et al. [13] proposes both memory scheduler framework and six

different scheduling policies, which is one of the earliest works

on memory scheduling.

Although some memory scheduling algorithms are claimed

to be easily integrated into memory controllers [6-8, 11, 16],

they usually introduce complex hardware logic and require

extra storage in memory controllers to store per core/thread

information, which can be an obstacle to the scalability of on-

chip core number. Therefore, industrial venders seem to have

some hesitation in adopting aggressive memory scheduling

algorithms [4]. In this paper, we propose an approach to

effectively eliminate the memory contention and interference

problem without any hardware modification to memory

controllers through operating system page allocation, which

aggregates physical memory pages for each thread into specific

memory banks.

Although the bubble filling scheduling (BFS) can minimize

memory access schedule length and memory access time, it

hardly mitigates the memory contention and interference. In

this paper, combine page allocation based thread scheduling

with BFS based memory access scheduling (CTMS) to

simultaneously minimize memory access schedule length and

memory access time and reduce memory contention and

interference for multi-core systems.

We implement CTMS in both single and dual memory

controller architecture and evaluate CTMS on 4-core and 8-

core platforms. Experimental results show CTMS is 12.1% and

13.2% shorter memory access time in single and dual memory

controller respectively, while improving 11.8% throughput on

average. Moreover, CTMS also saves 5.8% of the energy

consumption of memory system.

In summary, we make the following contributions:

(1) Based on operating system page allocation, aggregate

physical memory pages into one memory bank group for one

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38741950

粉丝: 2
资源: 962

优化多核系统性能：线程与内存调度的协同策略

非对称多核处理器上的操作系统集成调度.pdf

面向实时流处理的多核多线程处理器访存队列.pdf

基于异构感知静态调度与动态线程迁移的异构多核调度机制 (2011年)

温度感知的Linux多核调度算法研究.pdf

基于NoC的多核分布式操作系统.pdf

C++多核编程：用户线程与内核线程解析

JAVA多线程详解：线程状态、调度与同步

Java多核多线程编程：实例演示与Java5新特性

TBB与MPI结合：多核集群混合并行编程新模型

【R语言nlminb并行处理】：多核处理器性能极致挖掘

最新资源