优化多核静态调度的精确能量模型

需积分: 10 138 浏览量更新于2024-09-10 收藏 2.37MB PDF 举报

"Accurate energy modeling for many-core static schedules" 是一篇发表在2016年《Microprocessors and Microsystems》期刊上的研究文章，作者包括Simon Holmbacka, Jörg Keller, Patrick Eitschberger和Johan Lilius。文章主要关注的是多核处理器系统中的能量建模，特别是针对静态调度的流应用。在现代计算机领域，多核处理器系统因其强大的并行处理能力而备受瞩目，提供了巨大的性能潜力。然而，随着硬件结构的复杂度增加，这些系统也面临着一系列挑战，如高操作温度、高昂的电力消耗、由于主动冷却产生的不悦噪音以及移动设备电池寿命的缩短。这些问题的核心在于能源效率的低下。文章详细探讨了如何构建准确的能量模型来应对这些挑战，特别是对于那些采用静态调度的多核系统。静态调度是一种预先定义任务分配和执行顺序的方法，这种方法在许多应用场景中具有优势，因为它可以提供确定性并简化管理。然而，静态调度在处理能量效率方面可能面临困难，因为它们通常忽视了运行时的动态变化，如任务负载的变化和系统状态的波动。作者们提出了一种新的方法，旨在更精确地预测和管理多核系统在执行静态调度流应用时的能耗。这种方法可能包括对每个核心的功率特性进行建模，考虑到不同工作负载下的功耗差异，以及考虑系统层面的功耗，例如内存访问和通信开销。通过这样的模型，系统设计者和优化者可以更好地理解能量消耗的来源，并据此制定策略，以提高能源效率，降低运行成本，同时减少对环境的影响。关键词：功率管理、多核系统、功率模型、静态调度，表明该研究专注于能源效率的提升，特别是在多核架构的静态调度场景下，通过精准的能量模型来优化系统性能和降低能耗。总结来说，这篇文章是多核计算领域的关键研究，它为理解和优化多核处理器系统在执行静态调度任务时的能源效率提供了一种新方法。通过建立准确的能量模型，开发者能够更有效地管理和减少与高性能计算相关的能源消耗问题，这对于推动绿色计算和可持续发展的信息技术至关重要。

16 S. Holmbacka et al. / Microprocessors and Microsystems 43 (2016) 14–25

on single devices with a ﬁxed deadline and on real-time systems.

In our approach we focus on many-core processors and static

schedules and we consider different deadlines for each schedule.

Zuhravlev et al. [32] give a survey of energy-cognizant schedul-

ing techniques. Many scheduling algorithms are presented and dif-

ferent techniques are explained. They distinguish between DVFS

and DPM-based solutions, thermal management solutions and

asymmetry-aware scheduling. None of the described scheduling al-

gorithms take the switching or shutdown/wakeup overhead into

account like our approach.

While acknowledging that DVFS and DPM are possible energy

savers in data centers [5,14,19] , our work focus on core level gran-

ularity with a smaller time scale and our measurements are based

on the per-core sleep state mechanism rather than suspension to

RAM or CPU hibernation. Aside from the mentioned differences,

none of the previous work deals with latency overhead for both

DVFS and DPM on a practical level from the operating system’s

point of view. Without this information, it is diﬃcult to determine

the additional penalty regarding energy and performance for us-

ing power management on modern multi-core hardware using an

off-the-shelf OS such as Linux.

3. Power distribution and latency of power-saving mechanisms

Power saving techniques in microprocessors are hardware-

software coordinated mechanisms used to scale up or down parts

of the CPU dynamically during runtime. We outline the function-

alities and current implementation in the Linux kernel to display

the obstacles of practically using power management.

3.1. Dynamic Voltage and Frequency Scaling (DVFS)

The DVFS functionality was integrated into microprocessors to

lower the dynamic power dissipation of a CPU by scaling the clock

frequency and the chip voltage. Eq. (1) shows the simple relation

of these characteristics and the dynamic power

dynamic

= C · f · V

(1)

where C is the effective charging capacitance of the CPU, f is

the clock frequency and V is the CPU supply voltage. Since DVFS

reduces both the frequency and voltage of the chip (which is

squared), the power savings are more signiﬁcant when used on

high clock frequencies [28,31] .

The relation between frequency and voltage is usually stored

in a hardware speciﬁc look-up table from which the OS retrieves

the values as the DVFS functionality is utilized. Since the clock fre-

quency switching involves both hardware and software actions, we

investigated the procedure in more detail to pinpoint the source of

the latency. In a Linux based system the following core procedure

describes how the clock frequency is scaled:

(1) A change in frequency is requested by the user.

(2) A mutex is taken to prevent other threads from changing the

frequency.

(3) Platform-speciﬁc routines are called from the generic inter-

face.

(4) The PLL is switched out to a temporary MPLL source.

(5) A safe voltage level for the new clock frequency is selected.

(6) New values for clock divider and PLL are written to registers.

(7) The mutex is given back and the system returns to normal

operation.

3.1.1. DVFS implementations

To adjust the DVFS settings, the Linux kernel uses a frequency

governor [15] to select, during run-time, the most appropriate fre-

quency based on a set of policies. In order to not be affected by the

Listing 1. Pseudo code for measuring DVFS latency using system calls.

Listing 2. Pseudo code for measuring DVFS latency using sysfs .

governors, we selected the userspace governor for application-

controlled DVFS. The DVFS functionality can be accessed either by

directly writing to the sysfs interface or by using the system

calls. By using the sysfs , the DVFS procedure includes ﬁle man-

agement which is expected to introduce more overhead than call-

ing the kernel headers directly from the application. We studied,

however, both options in order to validate the latency differences

between the user space interface and the system call.

3.1.1.1. System call interface. The system call interface for DVFS un-

der Linux is accessible directly in the Linux kernel. We measured

the elapsed time between issuing the DVFS system call and the

return of the call which indicates a change in clock frequency.

Listing 1 outlines the pseudo code for accessing the DVFS func-

tionality from the system call interface.

3.1.1.2. User space interface. The second option is to use the sysfs

interface for accessing the DVFS functionality from user space. The

CPU clock frequency is altered by writing the frequency to be used

from now on into a sysfs ﬁle, which is read and consequently

used to change the frequency. The kernel functionality is not di-

rectly called from the c-program, but ﬁle system I/O is required for

both reads and writes to the sysfs ﬁlesystem. Listing 2 outlines

an example for the DVFS call via the sysfs interface.

3.1.2. DVFS measurement results

The user space and the kernel space mechanisms were eval-

uated, and the results are presented in this section. Since the

DVFS mechanism is ultimately executed on kernel- and user space

threads, the system should be stressed using different load levels

to evaluate the impact on the response time. For this purpose, we

used spurg-bench [23] . Spurg-bench is a benchmark capa-

ble of generating a deﬁned set of load levels on a set of threads

executing for example ﬂoating point multiplications. We gener-

ated load levels in the range [0;90]% using spurg-Bench . Fur-

thermore we generated a load level of 100% using stress

since

this benchmark is designed to represent the maximum case CPU

utilization. All experiments were iterated 100 times with differ-

ent frequency hops, and with a timing granularity of microseconds.

http://people.seas.harvard.edu/apw/stress/ .

剩余11页未读，继续阅读

qq_35077624

粉丝: 0
资源: 2

优化多核静态调度的精确能量模型

tornado-6.4.1-cp38-abi3-musllinux_1_2_i686.whl

tornado-6.1-cp36-cp36m-manylinux2014_aarch64.whl

基于java的ssm停车位短租系统程序答辩PPT.pptx

tornado-6.4b1-cp38-abi3-musllinux_1_1_x86_64.whl

基于java的招生管理系统答辩PPT.pptx

课设毕设基于SpringBoot+Vue的医学电子技术线上翻转课堂系统源码可运行.zip

基于java的农机电招平台答辩PPT.pptx

jdk23 甲骨文官方安装包

基于java的机场网上订票系统答辩PPT.pptx

【java毕业设计】小学家校互联平台源码（springboot+vue+mysql+说明文档）.zip

最新资源