A Novel CPU-GPU Cooperative Implementation of A
Parallel Two-List Algorithm for the Subset-Sum Problem
Lanjun Wan
Hunan University
Hunan 410082, China
wancanjun2008@163.com
Kenli Li
Hunan University
Hunan 410082, China
lkl510@263.net
Jing Liu
Hunan University
Hunan 410082, China
Idealer@126.com
Keqin Li
Hunan University
Hunan 410082, China
State University of New York
New Paltz, New York 12561
lik@newpaltz.edu
ABSTRACT
The subset-sum problem is a well-known NP-complete deci-
sion problem. Many parallel algorithms have been developed
to solve the problem within a reasonable computation time,
and some of them have been implemented on a GPU. How-
ever, the GPU implementations of these parallel algorithms
may fail to fully utilize all the CPU cores and the GPU re-
sources at the same time. When the GPU performs some
tasks, only one CPU core is used to control the GPU, all
the rest of CPU cores are in idle state, this leads to large
amounts of available CPU resources are wasted. This paper
proposes a novel CPU-GPU cooperative implementation of
a parallel two-list algorithm to efficiently solve the subset-
sum problem in a heterogeneous CPU-GPU system, which
enables the efficient utilization of all the available computa-
tional resources of both CPUs and GPUs. In order to find
the most appropriate task distribution ratio between CPUs
and GPUs, this paper establishes an optimal task distribu-
tion model. A series of experiments are conducted on two
different hardware platforms. The experimental results show
that the CPU-GPU cooperative implementation produces a
speedup factor of 9.2 over the best sequential implementa-
tion, achieves up to 96.3% performance improvement over
the optimized CPU-only implementation, and yields up to
25.7% performance improvement over the optimized GPU-
only implementation.
Categories and Subject Descriptors
D.1.3 [Programming Techniques]: Parallel Programming
General Terms
Algorithms, Experimentation, Performance
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
PMAM ’14, February 15-19 2014, Orlando, FL, USA
Copyright 2014 ACM 978-1-4503-2655-1/14/02 ...$15.00.
http://dx.doi.org/10.1145/2560683.2560688.
Keywords
CPU-GPU cooperative computing, knapsack problem, par-
allel two-list algorithm, subset-sum problem, CUDA
1. INTRODUCTION
Given n positive integers W =[w
1
, w
2
, · · · , w
n
] and a pos-
itive integer M, the subset-sum problem (SSP) is the de-
cision problem of finding a set I ⊆ {1, 2, · · · , n}, such that
w
i
= M, i ∈ I. In other words, the goal is to find a binary
n-tuple solution X=[x
1
, x
2
, · · · , x
n
] for the equation
n
i=1
w
i
x
i
= M, x
i
∈ {0, 1}. (1)
SSP is well-known to be NP-complete, and it is a special
case of the 0/1 knapsack problem. It has many real-world
applications, such as stock cutting, cargo loading, capital
budgeting, job scheduling, workload allocation, and project
selection [8, 16, 12].
In recent decades, many exact and heuristic algorithms
have been employed to solve SSP. A well known approach is
the dynamic programming algorithm [2] which solves SSP in
pseudo-polynomial time, but it has exponential time com-
plexity when the knapsack capacity is large enough. A
tremendous improvement was made by Horowitz and Sahni
[10], who proposed the two-list algorithm which solves SSP
in time O(n2
n/2
) with O(2
n/2
) memory space. With the
advent of parallel computing, a large effort has been done in
order to reduce the computation time of SSP. Based on the
two-list algorithm and the SIMD (Single Instruction Mul-
tiple Data) model with shared-memory, the parallelization
of the two-list algorithm has been extensively discussed in
[11, 9, 15, 21, 6, 14]. However, there are no efficient imple-
mentations of these parallel two-list algorithms on modern
heterogeneous environments.
Recently, heterogeneous CPU-GPU system has been wide-
ly used, which is a powerful way to deal with time-intensive
problems [5], because GPU can offer high levels data paral-
lelism and tremendous computational power. To solve the
knapsack problems on a GPU, some work has been per-
formed in recent years. Bokhari [3] explored a paralleliza-
tion of the dynamic programming algorithm, which solves