Parallel Pareto Optimization for Subset Selection
Chao Qian
1,2
, Jing-Cheng Shi
1
, Yang Yu
1
, Ke Tang
2
, Zhi-Hua Zhou
1⇤
1
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
2
UBRI, School of Computer Science and Technology,
University of Science and Technology of China, Hefei 230027, China
{qianc, shijc, yuy, zhouzh}@lamda.nju.edu.cn, ketang@ustc.edu.cn
Abstract
Subset selection that selects a few variables from
a large set is a fundamental problem in many ar-
eas. The recently emerged Pareto Optimization
for Subset Selection (POSS) method is a power-
ful approximation solver for this problem. How-
ever, POSS is not readily parallelizable, restricting
its large-scale applications on modern computing
architectures. In this paper, we propose PPOSS, a
parallel version of POSS. Our theoretical analysis
shows that PPOSS has good properties for paral-
lelization while preserving the approximation qual-
ity: when the number of processors is limited (less
than the total number of variables), the running
time of PPOSS can be reduced almost linearly with
respect to the number of processors; with increas-
ing number of processors, the running time can be
further reduced, eventually to a constant. Empiri-
cal studies verify the effectiveness of PPOSS, and
moreover suggest that the asynchronous implemen-
tation is more efficient with little quality loss.
1 Introduction
Given a total set of n variables, the subset selection problem is
to select a subset of size at most k for optimizing some given
objective. One origin of this problem is the column subset
selection problem
[
Gu and Eisenstat, 1996
]
, which aims at
selecting a few columns from a matrix that capture as much of
the matrix as possible. Since then, subset selection has been
significantly extended and numerous applications of it have
emerged, e.g., feature selection, sparse learning, compressed
sensing, etc.
Subset selection is NP-hard in general
[
Davis et al., 1997
]
.
Much effort has been put into developing polynomial-time
approximation algorithms. These algorithms can be mainly
categorized into two branches, greedy algorithms and con-
vex relaxation methods. Greedy algorithms iteratively add
or remove one variable that makes the given objective cur-
rently optimized
[
Gilbert et al., 2003; Tropp, 2004
]
. Albeit
⇤
This work was supported by the NSFC (61333014, 61375061,
61329302), the Collaborative Innovation Center of Novel Software
Technology and Industrialization, and the 2015 Microsoft Research
Asia Collaborative Research Program.
widely used in practice, the performance of these algorithms
is limited due to their greedy nature. Convex relaxation meth-
ods relax the original problem by replacing the set size con-
straint (i.e., the `
0
-norm constraint) with convex constraints,
e.g., the `
1
-norm constraint
[
Tibshirani, 1996
]
and the elastic
net penalty
[
Zou and Hastie, 2005
]
. However, the optimal so-
lutions of the relaxed problem could be distant to that of the
original problem.
Recently, Pareto optimization has been shown to be very
powerful for the subset selection problem
[
Qian et al., 2015c
]
.
The Pareto Optimization for Subset Selection (POSS) method
treats subset selection as a bi-objective optimization prob-
lem, which requires optimizing the given objective and min-
imizing the subset size simultaneously. Then, a bi-objective
evolutionary algorithm with theoretical guarantee
[
Yu et al.,
2012; Qian et al., 2015a; 2015b
]
is applied to solve it. Fi-
nally, the best solution satisfying the size constraint is picked
out from the solution set produced by POSS. POSS is proved
to achieve the best previous known polynomial-time approx-
imation guarantee
[
Das and Kempe, 2011
]
on the sparse re-
gression problem
[
Miller, 2002
]
, a representative example of
subset selection. Particularly, it can also even find an optimal
solution on an important subclass of sparse regression
[
Das
and Kempe, 2008
]
. In addition to the theoretical guarantee,
POSS has also achieved significantly better empirical perfor-
mance than the greedy and the relaxation methods.
POSS requires calling 2ek
2
n (e ⇡ 2.71828 is Euler’s num-
ber) number of objective function evaluations
[
Qian et al.,
2015c
]
to achieve a high quality solution, which could be un-
satisfactory from the practical viewpoint when k and n are
large. On the other hand, POSS is a sequential algorithm that
cannot be readily parallelized, which hinders the exploration
of modern computer facilities for applying POSS to large-
scale real-world problems.
In this paper, we propose a parallel version of POSS, called
PPOSS. Instead of generating one solution at a time (as in
POSS), PPOSS generates as many solutions as the number
of processors at a time, and can be easily parallelized. More
important, on subset selection with monotone objective func-
tions, we prove that, while preserving the solution quality,
(1) when the number of processors is limited (less than the
number n of variables), the running time of PPOSS can be
reduced almost linearly w.r.t. the number of processors;
(2) with increasing number of processors, the running time
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)