Combine Thread with Memory Scheduling for
Maximizing Performance in Multi-core Systems
Gangyong Jia
1
, Guangjie Han
2
, Liang Shi
3
, Jian Wan
1
, Dong Dai
4
1
Department of Computer Science and Technology, Hangzhou Dianzi University
Hangzhou, 310018, China
2
Department of Computer Science, Hohai University
Changzhou, 213022, China
3
Department of Computer Science and Technology, Chongqing University
Chongqing, 400044, China
4
Department of Computer Science, Texas Tech University
Lubbock TX 79409, USA
gangyong@mail.ustc.edu.cn; hanguangjie@gmail.com; shiliang@cqu.edu.cn; dongdaily@gmail.com
Abstract
The growing gap between microprocessor speed and DRAM
speed is a major problem that computer designers are facing. In
order to narrow the gap, it is necessary to improve DRAM’s
speed and throughput. Moreover, on multi-core platforms,
DRAM memory shared by all cores usually suffers from the
memory contention and interference problem, which can cause
serious performance degradation and unfairness among parallel
running threads. To address these problems, this paper proposes
techniques to take both advantages of partitioning cores, threads
and memory banks into groups to reduce interference among
different groups and grouping the memory accesses of the same
row together to reduce cache miss rate. A memory optimization
framework combined thread scheduling with memory scheduling
(CTMS) is proposed in this paper, which simultaneously
minimizes memory access schedule length, memory access time
and reduce interference to maximize performance for multi-core
systems. Experimental results show CTMS is 12.6% shorter in
memory access time, while improving 11.8% throughput on
average. Moreover, CTMS also saves 5.8% of the energy
consumption.
Keywords—Thread scheduling; memory scheduling; memory
interference; memory access time; performance; energy
1. INTRODUCTION
The growing gap between microprocessor speed and
DRAM speed is a major problem that computer designers are
facing [1-3]. More seriously, as the multi-core becoming the
dominant platforms, DRAM memory shared by all cores
usually suffers from the memory contention and interference
problem, which can cause serious performance degradation and
unfairness among parallel running threads. Specifically,
modern multi-core machines consist of many components, such
as processing cores, prefetchers and DMA engines, which can
generate memory requests with different characteristics and
priorities. For example, different cores can generate memory-
intensive and non-intensive requests simultaneously;
prefetchers’ requests are of low priority and DMA engines’
requests are sequential. If memory controllers are unable to
distinguish these different requests, interference inevitably
occurs [4, 5].
A number of recently proposed memory resource
partitioning [6-11, 32, 33] and memory access scheduling [12-
15] algorithms, leveraging the different characteristics
information and three micro operations besides the data
transfer: bank precharge, row activate and column access
respectively, have been demonstrated to be able to effectively
reduce the memory contention and interference and minimize
memory access schedule length and memory access time. For
instance, TCM [6], which classifies threads into memory-
intensive group and CPU-intensive group and uses different
policies for the two groups, is shown to exhibit both
performance and QoS improvements for overall system. Rixner
et al. [13] proposes both memory scheduler framework and six
different scheduling policies, which is one of the earliest works
on memory scheduling.
Although some memory scheduling algorithms are claimed
to be easily integrated into memory controllers [6-8, 11, 16],
they usually introduce complex hardware logic and require
extra storage in memory controllers to store per core/thread
information, which can be an obstacle to the scalability of on-
chip core number. Therefore, industrial venders seem to have
some hesitation in adopting aggressive memory scheduling
algorithms [4]. In this paper, we propose an approach to
effectively eliminate the memory contention and interference
problem without any hardware modification to memory
controllers through operating system page allocation, which
aggregates physical memory pages for each thread into specific
memory banks.
Although the bubble filling scheduling (BFS) can minimize
memory access schedule length and memory access time, it
hardly mitigates the memory contention and interference. In
this paper, combine page allocation based thread scheduling
with BFS based memory access scheduling (CTMS) to
simultaneously minimize memory access schedule length and
memory access time and reduce memory contention and
interference for multi-core systems.
We implement CTMS in both single and dual memory
controller architecture and evaluate CTMS on 4-core and 8-
core platforms. Experimental results show CTMS is 12.1% and
13.2% shorter memory access time in single and dual memory
controller respectively, while improving 11.8% throughput on
average. Moreover, CTMS also saves 5.8% of the energy
consumption of memory system.
In summary, we make the following contributions:
(1) Based on operating system page allocation, aggregate
physical memory pages into one memory bank group for one