虚拟集群的架构感知分布式资源管理：A-DRM解析

32 浏览量更新于2024-08-29 收藏 778KB PDF 举报

"A-DRM：虚拟集群的架构感知的分布式资源管理" 本文是一篇研究论文，探讨了在大规模云计算平台中广泛采用的虚拟化技术。这些虚拟化系统利用分布式资源管理（DRM）来动态迁移和整合虚拟机，以实现高资源利用率和节能。然而，传统的DRM策略主要依赖于操作系统级别的指标，如CPU利用率、内存容量需求和I/O利用率，来检测和平衡资源争用。它们忽略了微架构层面的资源干扰，例如在同一主机上运行的不同虚拟机之间的内存带宽争用，而这种信息操作系统通常是无法获取的。作者指出，这种对微架构层面资源干扰的忽视可能导致性能下降和效率损失。A-DRM（Architecture-aware Distributed Resource Management）正是为了解决这个问题而提出的新型框架。A-DRM的目标是扩展DRM的能力，使其能够感知到微架构级别的资源竞争，并据此做出更精细的决策，以优化资源分配和虚拟机调度。在A-DRM中，研究人员提出了一种新的监控和分析机制，能够实时监测虚拟机间的微架构资源使用情况，比如内存带宽、缓存冲突等。通过收集这些信息，A-DRM可以预测和识别潜在的性能瓶颈，进而调整虚拟机的部署，避免或减少资源争用，提高整体系统的性能和效率。此外，A-DRM可能还包括一种智能的资源调度算法，它能够基于微架构数据进行决策，选择最佳的虚拟机迁移策略。这种方法不仅考虑了传统指标，还考虑了微架构层面的影响，使得资源分配更加均衡，同时减少了由于资源争用导致的性能波动。该研究对云服务提供商和数据中心运营商具有重要意义，因为通过A-DRM，他们可以更好地管理其资源，提升客户服务质量，同时降低能耗。未来的研究可能会进一步探索如何将A-DRM与现有的资源管理系统集成，以及如何在不同类型的硬件和工作负载下优化其性能。 A-DRM是一种创新的分布式资源管理方案，它填补了当前虚拟化环境中操作系统对微架构资源干扰认知的空白，有望提升云环境中的资源效率和用户体验。通过深入理解和利用微架构信息，A-DRM为构建更高效、更稳定的虚拟化集群提供了新的途径。

100

0 300 600 900 1200

MBW [%]

Time (s)

100

MEM [%]

100

CPU [%]

Host A

Host B

Figure 1: Resource utilization of Traditional

DRM

100

0 300 600 900 1200

MBW [%]

Time (s)

100

MEM [%]

100

CPU [%]

Host A

Host B

Figure 2: Resource utilization of Traditional

DRM + MBW-awareness

0.2

0.4

0.6

0.8

1.2

vm01-STREAM

vm02-STREAM

vm03-STREAM

vm04-STREAM

vm05-STREAM

vm06-STREAM

vm07-STREAM

vm08-gromacs

vm09-gromacs

vm10-gromacs

vm11-gromacs

vm12-gromacs

vm13-gromacs

vm14-gromacs

IPC

Traditional DRM Architecture-aware DRM

49.2%

Figure 3: IPC Performance (HM is harmonic

mean.)

that do not contend for the same shared resource are mapped

to the same socket [12, 45, 60]. Our focus, in this work, is

not on a single server, but on a cluster of servers. We ex-

plore VM migration across nodes, which is complementary

to migrating applications/VMs across sockets.

2.2 Limitations of Traditional DRM Schemes

To address the VM-to-Host mapping challenge, prior

works [23, 27–31, 34, 56, 72] have proposed to manage

the physical resources by monitoring operating-system-level

metrics (such as CPU utilization, memory capacity demand)

and appropriately mapping VMs to hosts such that the uti-

lization of CPU/memory resources is balanced across differ-

ent hosts. While these schemes have been shown to be effec-

tive at CPU/memory resource scheduling and load balanc-

ing, they have a fundamental limitation – they are not aware

of the microarchitecture-level shared resource interference.

2.2.1 Lack of Microarchitecture-level Shared Resource

Interference Awareness

Prior works, including commercial products, base migration

decisions on operating-system-level-metrics. However, such

metrics cannot capture the microarchitecture-level shared re-

source interference characteristics. Our real workload pro-

ﬁling results (detailed in Section 6.1) show that there are

many workloads, e.g., STREAM and gromacs, that exhibit

similar CPU utilization and demand for memory capacity,

but have very different memory bandwidth consumption.

Thus, when VMs exhibit similar CPU and memory capac-

ity utilization and the host is not overcommitted (i.e., CPU

or memory is under-utilized), traditional DRM schemes that

are unaware of microarchitecture-level shared resource inter-

ference characteristics would not recognize a problem and

would let the current VM-to-host mapping continue. How-

ever, the physical host might, in reality, be experiencing

heavy contention at the microarchitecture-level shared re-

sources such as shared cache and main memory.

2.2.2 Ofﬂine Proﬁling to Characterize Interference

Some previous works [31, 37, 75] seek to mitigate inter-

ference between applications/VMs at the microarchitecture-

level shared resources by deﬁning constraints based on of-

ﬂine proﬁling of applications/VMs, such that applications

that contend with each other are not co-located. For instance,

in VMware DRS [31], rules can be deﬁned for VM-to-VM

or VM-to-Host mappings. While such an approach based on

ofﬂine proﬁling could work in some scenarios, there are two

major drawbacks to such an approach. First, it might not

always be feasible to proﬁle applications. For instance, in

a cloud service such as Amazon EC2 [2] where VMs are

leased to any user, it is not feasible to proﬁle applications

ofﬂine. Second, even when workloads can be proﬁled of-

ﬂine, due to workload phase changes and changing inputs,

the interference characteristics might be different compared

to when the ofﬂine proﬁling was performed. Hence, such an

ofﬂine proﬁling approach has limited applicability.

2.3 The Impact of Interference Unawareness

In this section, we demonstrate the shortcomings of DRM

schemes that are unaware of microarchitecture-level shared

resource interference with case studies. We pick two ap-

plications: gromacs from the SPEC CPU2006 benchmark

suite [6] and STREAM [7]. STREAM and gromacs have

very similar memory capacity demand, while having very

different memory bandwidth usage: STREAM has high

bandwidth demand, gromacs has low (more workload pairs

that have such characteristics can be found in Section 6.1).

We run seven copies (VMs) of STREAM on Host A and

seven copies (VMs) of gromacs on Host B (initially). Both

of the hosts are SuperMicro servers equipped with two Intel

Xeon L5630 processors running at 2.13GHz (detailed in

Section 5). Each VM is conﬁgured to have 1 vCPU and 2

GB memory.

Figure 1 shows the CPU utilization (CPU), total mem-

ory capacity demand of VMs over host memory capacity

(memory capacity utilization - MEM), and memory band-

width utilization (MBW) of the hosts when a traditional

DRM scheme, which relies on CPU utilization and mem-

ory capacity demand, is employed. We see that although the

memory bandwidth on Host A is heavily contended (close

to achieving the practically possible peak bandwidth [21]),

the traditional DRM scheme does nothing (i.e., does not mi-

grate VMs) since the CPU and memory capacity on Host A

and Host B are under-utilized and Host A and Host B have

similar CPU and memory capacity demands for all VMs.

Figure 2 shows the same information for the same two

hosts, Host A and Host B. However, we use a memory-

bandwidth-contention-aware DRM scheme to migrate three

VMs that consume the most memory bandwidth from Host

A to Host B at 300 seconds, 600 seconds and 900 seconds.

To keep the CPU resources from being oversubscribed, we

also migrate three VMs that have low memory bandwidth

requirements from Host B to Host A. We see that after the

three migrations, the memory bandwidth usage on Host A

剩余13页未读，继续阅读

weixin_38733787

粉丝: 2
资源: 842

虚拟集群的架构感知分布式资源管理：A-DRM解析

ISO/IEC 14496-1:2010

ISO Media File format specification MP4 Technology under consideration for ISO/IEC 14496-1:2001/Amd 3

Its-time-to-cut-WideVine-DRM:关于为何不应该使用Google WideVine DRM的文章

videojs-with-drm:使用破折号Js在Video Js中设置DRM

guile-drm：libdrm的GNU Guile绑定（直接渲染管理器）

Workload-DRM:用于使用python描述股票价格走势的金融图表

music-drm-game-theory:音乐行业DRM博弈论项目

HB-DRM-free-bulk-downloader:用于下载不起眼的DRM-Free文件的powershell脚本

WM-DRM-Removal:在 Windows XP 上从 Windows 媒体文件中删除 DRM 的工具

magisk-drm-disabler:来自GitLab的只读镜像。 一个旨在完全在Android上禁用DRM的Magisk模块

最新资源

magisk-drm-disabler:来自GitLab的只读镜像。一个旨在完全在Android上禁用DRM的Magisk模块