CPU-GPU协作：并行两列表算法解决子集和问题的新策略

159 浏览量更新于2024-08-26 收藏 242KB PDF 举报

"ANovelCPU-GPUCooperativeImplementationofA ParallelTwo-ListAlgorithmfortheSubset-SumProblem LanjunWan, KenliLi, JingLiu, KeqinLi 子集和问题，又称为背包问题，是计算复杂性理论中的一个经典难题，属于NP完全类别。为了解决这个问题，研究者已经提出了多种并行算法，以在可接受的时间内求解。这些算法中的一部分已经被移植到图形处理单元（GPU）上，利用其强大的并行计算能力。然而，当前的GPU实现存在一个问题，即在GPU执行任务时，往往只有一个CPU核心用于协调工作，导致其他CPU核心未被充分利用，从而浪费了宝贵的计算资源。本论文提出了一种创新的CPU-GPU协同计算方法，用于并行双列表算法，旨在更高效地解决子集和问题。在异构CPU-GPU系统中，这种新的实现方式可以充分调动所有可用的CPU和GPU资源，避免资源闲置。通过建立最佳任务分配模型，研究者确定了CPU和GPU之间的最优任务分配比例，以实现两者间的高效协作。实验在不同的硬件平台上进行，结果显示，采用CPU-GPU协作实现的并行双列表算法相对于最佳顺序实现，加速因子达到了9.2，这意味着其性能提升了96.3%。相较于仅优化的CPU实现，该方法提升了25.7%的性能。这些改进表明，这种新策略能够充分利用系统中的并行计算资源，显著提高问题求解的速度。在实际应用中，子集和问题广泛存在于组合优化、密码学、数据压缩等多个领域。因此，提高子集和问题的求解效率对于提升这些领域的计算性能至关重要。CPU-GPU协同计算的并行双列表算法提供了一种有效途径，不仅优化了计算资源的使用，还降低了计算时间，这对于需要快速处理大量子集和问题的场景具有重要意义。总结来说，这篇论文贡献了一种新的并行算法实现策略，通过CPU-GPU协同计算解决了子集和问题在传统实现中资源利用率低下的问题。通过实验验证，这种方法能够显著提高算法的运行效率，对于未来设计和优化类似复杂问题的并行计算方案提供了有价值的参考。"

A Novel CPU-GPU Cooperative Implementation of A

Parallel Two-List Algorithm for the Subset-Sum Problem

Lanjun Wan

Hunan University

Hunan 410082, China

wancanjun2008@163.com

Kenli Li

Hunan University

Hunan 410082, China

lkl510@263.net

Jing Liu

Hunan University

Hunan 410082, China

Idealer@126.com

Keqin Li

Hunan University

Hunan 410082, China

State University of New York

New Paltz, New York 12561

lik@newpaltz.edu

ABSTRACT

The subset-sum problem is a well-known NP-complete deci-

sion problem. Many parallel algorithms have been developed

to solve the problem within a reasonable computation time,

and some of them have been implemented on a GPU. How-

ever, the GPU implementations of these parallel algorithms

may fail to fully utilize all the CPU cores and the GPU re-

sources at the same time. When the GPU performs some

tasks, only one CPU core is used to control the GPU, all

the rest of CPU cores are in idle state, this leads to large

amounts of available CPU resources are wasted. This paper

proposes a novel CPU-GPU cooperative implementation of

a parallel two-list algorithm to eﬃciently solve the subset-

sum problem in a heterogeneous CPU-GPU system, which

enables the eﬃcient utilization of all the available computa-

tional resources of both CPUs and GPUs. In order to ﬁnd

the most appropriate task distribution ratio between CPUs

and GPUs, this paper establishes an optimal task distribu-

tion model. A series of experiments are conducted on two

diﬀerent hardware platforms. The experimental results show

that the CPU-GPU cooperative implementation produces a

speedup factor of 9.2 over the best sequential implementa-

tion, achieves up to 96.3% performance improvement over

the optimized CPU-only implementation, and yields up to

25.7% performance improvement over the optimized GPU-

only implementation.

Categories and Subject Descriptors

D.1.3 [Programming Techniques]: Parallel Programming

General Terms

Algorithms, Experimentation, Performance

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

PMAM ’14, February 15-19 2014, Orlando, FL, USA

http://dx.doi.org/10.1145/2560683.2560688.

Keywords

CPU-GPU cooperative computing, knapsack problem, par-

allel two-list algorithm, subset-sum problem, CUDA

1. INTRODUCTION

Given n positive integers W =[w

, w

, · · · , w

] and a pos-

itive integer M, the subset-sum problem (SSP) is the de-

cision problem of ﬁnding a set I ⊆ {1, 2, · · · , n}, such that



= M, i ∈ I. In other words, the goal is to ﬁnd a binary

n-tuple solution X=[x

, x

, · · · , x

] for the equation



i=1

= M, x

∈ {0, 1}. (1)

SSP is well-known to be NP-complete, and it is a special

case of the 0/1 knapsack problem. It has many real-world

applications, such as stock cutting, cargo loading, capital

budgeting, job scheduling, workload allocation, and project

selection [8, 16, 12].

In recent decades, many exact and heuristic algorithms

have been employed to solve SSP. A well known approach is

the dynamic programming algorithm [2] which solves SSP in

pseudo-polynomial time, but it has exponential time com-

plexity when the knapsack capacity is large enough. A

tremendous improvement was made by Horowitz and Sahni

[10], who proposed the two-list algorithm which solves SSP

in time O(n2

n/2

) with O(2

n/2

) memory space. With the

advent of parallel computing, a large eﬀort has been done in

order to reduce the computation time of SSP. Based on the

two-list algorithm and the SIMD (Single Instruction Mul-

tiple Data) model with shared-memory, the parallelization

of the two-list algorithm has been extensively discussed in

[11, 9, 15, 21, 6, 14]. However, there are no eﬃcient imple-

mentations of these parallel two-list algorithms on modern

heterogeneous environments.

Recently, heterogeneous CPU-GPU system has been wide-

ly used, which is a powerful way to deal with time-intensive

problems [5], because GPU can oﬀer high levels data paral-

lelism and tremendous computational power. To solve the

knapsack problems on a GPU, some work has been per-

formed in recent years. Bokhari [3] explored a paralleliza-

tion of the dynamic programming algorithm, which solves

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38595690

粉丝: 6
资源: 942

CPU-GPU协作：并行两列表算法解决子集和问题的新策略

多核CPU-GPU异构平台下并行Agent仿真负载均衡方法.pdf

高效的CPU-GPU协同计算解决子和问题

针对子集和问题的并行两列表算法的GPU实现

优化CPU-GPU协同计算：解决子集和问题的新策略

GPU加速的并行两列表算法：子集和问题的有效解决方案

算法训练-基于Pytorch使用多GPU训练Yolov3目标检测算法-Multi-GPU-附项目源码-优质项目实战.zip

Efficient CPU-GPU cooperative computing for solving the subset-sum problem

基于GPU的并行最小生成树算法的设计与实现.pdf

遗传并行算法代码-高性能

行业分类-设备装置-一种基于CPU+GPU架构的空间几何体线段相交判断并行处理方法.zip

最新资源