共享分布式架构上的拓扑感知局部虚拟集群映射算法提升性能

57 浏览量更新于2024-08-26 收藏 661KB PDF 举报

本文主要探讨的是在共享分布式基础设施上的拓扑感知局部虚拟集群映射算法，针对新型虚拟化高性能计算（HPC）中心的需求。在这样的环境下，为了支持并行和分布式应用的高效执行，虚拟集群（VC）作为一种隔离且可配置的资源池被动态创建和部署。传统的VCM问题关注于如何在运行时为每个作业分配物理资源，这直接影响到VC的性能和整个系统的资源利用率。以往的研究往往忽视了并行作业中子任务间的角色差异，将VC内的所有虚拟机（VM）视为同等重要。然而，实际上，执行不同子任务的VM应该根据其在工作流程中的关键性赋予不同的权重。为此，作者提出了部分VC映射的概念，它区别于现有的全量映射方法，更注重VM之间的优先级和资源分配策略。为了实施部分映射，研究者引入了通信骨架（Communication Skeleton, CS），这是基于虚拟节点之间网络拓扑的关键通信结构，用于确定VM的重要性和依赖关系。他们设计了一个评估节点重要性的机制，通过分析并行作业的通信需求来构建CS。这个算法考虑了任务间的数据流动和通信效率，确保关键节点和连接得到优化。进而，本文提出了拓扑感知局部虚拟簇映射算法（TOP-VCM），它利用子图同构检测技术，精确地匹配通信骨架中的节点和链接需求。TOP-VCM在满足核心通信需求的同时，对非关键节点和链接的要求稍作妥协，以此降低映射复杂度，提高整体资源的利用效率。实验结果显示，TOP-VCM在保证VC性能的同时，显著提升了系统的总收入和资源利用率，以及映射算法的性能表现。总结来说，本文的贡献在于提出了一种新的VCM策略，它能够更好地适应并行作业的特性，提升分布式基础设施的资源管理效率，对于优化虚拟化HPC中心的性能和资源分配具有重要意义。

Topology-Aware Partial Virtual Cluster Mapping

Algorithm on Shared Distributed Infrastructures

Xiaohui Wei, Member, IEEE, H ongliang Li, Member, IEEE , Kun Yang, Senior Member, IEEE, and Lei Zou

Abstract—Novel virtualized HPC centers provide isolat ed and configur able Virtual Clusters (VC) on shared distributed

infrastructures as execution environments for parallel and distributed applications. These VCs are usually customized and d eployed

per job in runtime. Allocating physical resources for VC is known as V irtual Cluster Mappi ng (VCM) probl em, which is a critical

issue that affects b oth perfor mance of the VC and resource utilization of the system. Most previous works treat all Virtual Machines

(VMs) in a VC request equally. However, because sub-jobs in a parallel job usually perform dif ferent roles, the corr esponding

VMs in a VC that execute these sub- jobs respectively should have different leve ls of impor tance. Based on this a rgument, this pap er

introduces the concept of partial VC mapping in contrast to the full mapping methodology in the current literatu res. To fulfill

partial mapping, the impo rtant backbone com munica tion struc ture of parallel job call ed C ommun ication Skele ton (CS) is derived

based on the network topology among virtual nodes. To generate the CS of a job, mechanisms for evaluati ng the importance of

nodes are proposed. Eventually, a Topolog y-aware Partia l Virtu al Cluster Mappi ng algor ithm (TOP-VCM) is propo sed which is based

on sub-graph isomorphism detec tion. TOP-V CM can fully satisfy the nodes/lin ks requirem ents in CS to ensure the execution

performance with only slight degradation of other trivial nodes/links to significantly reduce the mapping difficulty. Simulati on

results have shown that TOP-VCM has significantly improve d the total revenue, the utilization of physical re sources and the

performance of mapping algorithm while satisfying the VC requirements.

Index Terms—Virtual cluster, parallel job, topology, partial mapping, shared distributed infrastructure

1INTRODUCTION

OVEL virtualized HPC systems are able to facilitate

multiple heterogeneous environments for various

applications simultaneously on shared distributed infra-

structures. Usually these infrastructures consist of one

or multiple data centers and supported by virtualization

technologies. To share physical resources, the execution

environments are u sually implemented by Virtual Machines

(VMs) or sets of related VMs organized as Virtual Clusters

(VCs)[1],[2],[3],[4],[5].TheseVCsareusuallyparticularly

allocated and deployed per job in runtime [6], [7], [8], [9] to

support customized parallel and distributed applicati ons.

Virtualization on high level system c omponents, such

as Virtual Network (VN) [10], [11], [3], [12], [13], [ 14] and

VC, enables HPC system to dynamically manage these

environments by placement [7], [8], [9], [15], [16] in

construction and live migration [17], [18], [19], [20], [21]

in runtime. How to efficiently map VC onto substrate

cluster(s) is called Virtual Cluster Mapping (VCM) problem,

which is a critical issue that affects both performance of

the VC and resource utilization of the system. It is the basis

of both VC placement and VC live migration problems.

This paper focuses on the VCM problem on shared

distributed infrastructures.

A parallel job consists of a set of sub-jobs.Usually,a

parallel job is executed on one customized VC: (1) each

sub-job executes on one VM hosted by a physical machine

(computing host) ( Fig. 1 ); (2) communication links between

parallel jobs are supported by Virtual Links (VLs) between

VMs. Each VL can be mapped onto one or multiple

connected Physical Links (PLs). As a result, the computing

and communication requirements of the job are trans-

formed into a VC request that consists of computing

capacities of VMs and the network bandwidth of each

links between VMs.

Mapping a VC onto physical/Substrate Cluster (SC) consists

of VM ma pping and V L m apping. The VCM problem i s

similar to Virtual Network Embedding (VNE) probl em. They

both need to map nod es and links onto a sub strate env i-

ronment. The terms ‘‘node/link mapping’’ and ‘‘VM/VL

mapping’’ are interchangeable when referring VCM prob-

lem in the rest of the p aper.

As a result, most previous VCM works are based on VNE

algorithms. Since this problem is known to be NP-hard most

existing works use heuristic-based algorith ms [22], [ 23], [24],

[25], [26], [22], [27], [28], [29], [30], [31], [32] , [33], [34] such as

simulated annealing [29], mixed integer programming [25],

linear programmi ng [24 ], and sub-graph isomorphism

search [35].

However, directly applying the VNE solutions to VCM

problem is not efficien t. On one hand, virtual nodes in VN

mayhaveadditionalresourceconstrains[25],[26],[36],

[33], suc h as node location. VC resource allocation has

no such constraints, which increases th e solution space

and consume s m ore execution time. On other hand, the

. X. Wei, H. Li, and L. Zou are with the College of Computer Science and

Technology, Jilin University and the Key Laboratory of Symbolic

Computation and Knowledge Engineering of Ministry of Education,

Changchun, 130012, China. E-mail: {weixh, lihongliang}@jlu.edu.cn.

. K. Yang is wi th the School of Computer S cience and Electronic

Engineering, University of Essex, Colchester, CO4 3SQ, United Kingdom.

E-mail: kunyang@essex.ac.u.

Manuscript received 19 May 2013; revised 17 Aug. 2013; accepted 21 Aug.

2013. Date of publication 5 Sept. 2013; date of current version 17 Sept. 2014.

Recommended for acceptance by Y. Cui.

For information on obtaining reprints of this article, please send e-mail to:

reprints@ieee.org, and reference the Digital Object Identifier below.

Digital Object Identifier no. 10.1109/TPDS.2013.224

1045-9219

2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 10, OCTOBER 2014 2721

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38516040

粉丝: 3
资源: 918

共享分布式架构上的拓扑感知局部虚拟集群映射算法提升性能

分布式存储在数字集群移动通信系统中的应用.pdf

Ceph CRUSH算法详解：分布式对象存储的智慧选择

Ceph分布式存储实战指南

ONOS集群驱动的光虚拟专用网架构研究

Linux集群与分布式系统搭建与管理

利用DNMAP执行分布式集群扫描任务的实践指南

布隆过滤器与分布式缓存一致性算法的集成

Redis集群架构分析：哨兵和集群模式的原理与实践，打造高可用集群

堆排序在分布式系统中的应用：如何设计可扩展的排序算法，前沿技术分享

Rmpi与高性能计算集群：结合案例讲解并行算法设计（并行算法设计实战）

最新资源